Factorized Multi-Modal Topic Model
|
|
- Blaze Crawford
- 5 years ago
- Views:
Transcription
1 Factorize Multi-Moal Topic Moel Seppo Virtanen 1, Yangqing Jia 2, Arto Klami 1, Trevor Darrell 2 1 Helsini Institute for Information Technology HIIT Department of Information an Compute Science, Aalto University 2 UC Bereley EECS an ICSI Abstract Multi-moal ata collections, such as corpora of paire images an text snippets, require analysis methos beyon single-view component an topic moels. For continuous observations the current ominant approach is base on extensions of canonical correlation analysis, factorizing the variation into components share by the ifferent moalities an those private to each of them. For count ata, multiple variants of topic moels attempting to tie the moalities together have been presente. All of these, however, lac the ability to learn components private to one moality, an consequently will try to force epenencies even between minimally correlating moalities. In this wor we combine the two approaches by presenting a novel HDP-base topic moel that automatically learns both share an private topics. The moel is shown to be especially useful for querying the contents of one omain given samples of the other. 1 INTRODUCTION Analysis of objects represente by multiple moalities has been an active research irection over the past few years. If the analysis of a single moality is characterize as learning some sort of components that escribe the ata, the tas in analysis of multiple moalities can be summarize as learning components that escribe both the variation within each moality but also the variation share between them (Klami an Kasi, 2008; Jia et al., 2010). The funamental problem is in learning how to correctly factorize the variation into the share an private components, so that the components can be intuitively interprete. For continuous vector-value samples the problem can be solve efficiently by a structural sparsity assumption (Jia et al., 2010; Virtanen et al., 2011), resulting in an extension of canonical correlation analysis (CCA) that moels not only the correlations but also components private to each moality. One prototypical example of multi-moal analysis is that of moeling collections of images an associate text snippets, such as captions or contents of a web page. When both text an image content can naturally be represente with bag of wors -type vectors, the assumptions mae by the above methos fail. Instea, such count ata calls for topic moels such as latent Dirichlet allocation (LDA): several extensions of LDA have been presente for multi-moal setups, incluing Blei an Joran (2003); Mimno an McCallum (2008); Salomatin et al. (2009); Yahneno an Hovavar (2009); Rasiwasia et al. (2010) an Puttivihya et al. (2011). However, none of these extensions are able to fin share an private topics in the same sense as the CCA-base moels o for continuous ata. Instea, the moels attempt to enforce strong correlation between the moalities, which is a reasonable assumption when analyzing e.g. multi-lingual textual corpora with similar languages but that oes not hol for analysis of images associate with free-flowing text. In most cases, the images will contain a consierable amount of information not relate to the text snippet, an it is not even guarantee that the text is relate at all to the visual content of the image. In this wor, we introuce a novel topic moel that combines the two above lines of wor. It buils on the correlate topic moels (CTM) by Blei an Lafferty (2007) an Paisley et al. (2011), by moeling correlations between topic allocations an by using a hierarchical Dirichlet process (HDP) formulation for automatically learning the number of the topics. The propose factorize multi-moal topic moel integrates the technical improvements of these single-moality topic moels to the multi-moal application, an in particular automatically learns to mae some topics
2 specific to each of the moalities, implementing the factorization iea of Klami an Kasi (2008) an Jia et al. (2010) use for continuous ata. The component selection plays a crucial role in implementing this property, implying that the HDP-base technique for automatically selecting the complexity is even more important for factorize multi-moal moels than it woul be a for a regular topic moel. The primary avantage of the new moel is that is oes not enforce correlations between the moalities, lie the earlier multi-moal topic moels o, but instea factorizes the variation into interpretable topics escribing share an private structure. The moel is very flexible an oes not enforce any particular factorization structure, but instea learns it from the ata. For example, the moel can completely ignore the share topics in case the moalities are inepenent or fin almost solely share topics when they are strongly correlate. In this wor we emonstrate the moel in analyzing moalities that have only wea relationships, a scenario for which the previous moels woul not wor. In particular, we analyze a collection of Wiipeia pages that consist of images an the whole text on the page. Such a collection has relatively low between-moality correlation an in particular inclues consierable amount of text that is not relate to the image at all, necessitating topics private to the text moality. The propose moel is shown to clearly outperform alternative HDP-base topic moels as well as corresponence LDA (Blei an Joran, 2003) in the tas of inferring the contents of a missing moality. 2 BACKGROUND: TOPIC MODELS To briefly summarize the topic moels an to introuce the notation use in the paper, we escribe the stanar topic moel of Latent Dirichlet Allocation (LDA) (Blei et al., 2003) through its generative process. We assume that wors occurring in a ocument are rawn from K topics. Each topic specifies a multinomial probability istribution over the vocabulary, parameterize through η rawn from the Dirichlet istribution Dir(γ1), an the topic proportions are multinomial with parameters θ Dir(ν1). The ocuments are generate by repeately sampling a topic inicator z Multi(θ) an then rawing a wor from the corresponing topic as x Multi(η z ). We will also heavily epen on the concept of correlate topic moels (CTM) (Blei an Lafferty, 2007). In the stanar LDA the topic proportions θ rawn from the Dirichlet istribution become inepenent except for wea negative correlation stemming from the normalization constraint. CTM replaces this choice by logistic normal istribution, first rawing an auxiliary variable from a Gaussian istribution ξ N(µ, Σ) an specifying the topic istribution as θ exp(ξ). The topics become correlate when Σ is not iagonal, an empirical experiments show increase preictive accuracy. Finally, our moel will be formulate through a hierarchical Dirichlet process (HDP) formulation (Teh et al., 2006), to enable automatic choice of the number of topics. As mentione in the introuction, the choice is even more critical for multi-moal moels, since we will have several sets of topics instea of just a single one; specifying the complexity for all of those in avance woul not be feasible. Our moel will use elements from the recently introuce Discrete Infinite Logistic Normal (DILN) moel by Paisley et al. (2011), which incorporates HDP into CTM. The ey iea of DILN is that the topic istributions θ are mae sparse by multiplying the exp(ξ) by sparse topic-selection terms. The topic istribution is given by θ Gamma(βp, exp( ξ )), where both β an p come from a stic-breaing process: β is the secon level consentration parameter, an 1 p = V i=1 (1 V i), where V Beta(1, α) with α as the first level concentration parameter. The expecte value of θ is proportional to βp exp(ξ ), illustrating the way the ifferent parameters influence the topic weights. For any finite ata collection, p > 0 only for a finite subset of topics an hence the moel automatically selects the number of topics. 3 FACTORIZED MULTI-MODAL TOPIC MODEL Consier a collection of ocuments each containing M wealy correlate moalities, where each moality has its own vocabulary. In the application of this paper the two vocabularies are textual an visual wors collecte from Wiipeia pages with text an a single image (though the moel woul irectly generalize to multiple images). We introuce a novel multi-moal topic moel that can be use to learn epenencies between these moalities, enabling e.g. preicting the textual content associate with a novel image. The problem is mae particularly challenging by the wea relationship between the moalities; several of the ocuments will contain large amounts of text not relate to the image content. For moeling the ata, we will use M separate vocabularies, so that wors (or visual wors) for each moality are rawn from separate ictionaries η (m) specific to each view m. The topic proportions θ (m) will also be specific to each moality, whereas the actual wors are sample inepenently for each moal-
3 ity given the topic proportions. The essential moeling question is then how the topic proportions are tie with each other, in orer to achieve the factorization into share an private topics. In brief, we will o this by (i) moeling epenencies between topics both within an across moalities an (ii) automatically selecting the number of topics for each type (share or private to any of the moalities). The topic proportions θ (m) are mae epenent by introucing auxiliary variables ξ (m), enoting by ξ = (ξ (1),..., ξ (M) ) the concatenation of them, an using the CTM prior ξ N(µ, Σ). This part of the moel correspons to the multi-fiel CTM with ifferent topic sets by Salomatin et al. (2009), an the ifferent blocs in Σ escribe ifferent types of epenencies between the topic proportions. In particular, the blocs aroun the iagonal escribe epenencies between the topic proportions of each moality, whereas the off-iagonal blocs escribe epenencies in topic proportions between the moalities. Having a CTM for the joint topic istribution is not yet sufficient for separating the share topics from private ones, since we can only control the correlation between the topic proportions. A large correlation between two topics for ifferent moalities woul imply that it is share, but lac of correlation (that is, Σ l = 0) woul not mae either component private. Instea, the weights woul simply be etermine inepenently. To create separate sets of share an private topics we nee to be able to switch some of the topics off in one or more of the moalities, similarly to how Jia et al. (2010) an Virtanen et al. (2011) switch off components to mae the same istinction in continuous ata moels. In the case of multi-fiel CTM this coul only be one by riving µ (the mean of the Gaussian prior for ξ ) towars minus infinity, which is not encourage by the moel an is ifficult to achieve with mean-fiel upates. We implement the share/private choice by separate HDPs, one for each moality, switching a subset of topics off for each moality separately by a mechanism similar to how the single-view DILN moel (Paisley et al., 2011) selects the topics. We introuce β (m) an p (m) for each moality m = 1,..., M, an raw them from separate HDPs, resulting in θ (m) Gamma(β (m) p (m), exp( ξ (m) )) as the final topic proportions. The topic istributions are still share through ξ (m) that were rawn from a single high-imensional Gaussian, but for each moality the stic weights p (m) select ifferent subsets of topics to be switche off. In the en, a finite number of topics remain for each moality, an the private topics can be ientifie as ones that have non-zero weight for one moality an are not correlate with topics active in µ (m) V Figure 1: A graphical representation of the factorize multi-moal topic moel. The ata has D ocuments escribe by M moalities. For each moality, the wors x (m) are rawn from ictionary specific to that moality, accoring to topic proportions θ (m) also specific to the moality. The topic proportions are generate by logistic transformation of latent variables ξ (m) that moel the correlations between the topics both within an across moalities, followe by topic selection with a HDP (enote by V an β in the plate; see text for etails) for each moality. As a result, the moel learns both topics moeling correlations between the moalities as well as topics private to each moality. other moalities. The final generative moel motivate by the above iscussion results in a collection of M correlate BOW ata sets X (m), generate as follows (see Figure 1 for graphical representation). For the whole collection we: create a ictionary of T (m) topics for each moality by rawing η (m) Dir(γ (m) 1) for = 1,.., T (m) raw the parameters α (m), β (m), V (m) of the DILN istribution for each moality from the stic-breaing formulation an construct p (m). For each ocument we then raw ξ N(µ, Σ) an partition it into the ifferent moalities as ξ = (ξ (1),..., ξ (M) ). For each moality, we then generate the wors inepenently as follows: form the topic proportion by rawing Y (m) Gamma(β (m) p (m), exp( ξ (m) )) an set θ (m) = Y (m) T (m) i=1 Y (m) i raw N (m) wors by choosing a topic z Multi(θ (m) ) an rawing a wor x Multi(η (m) z ) z T x N m D M
4 3.1 INFERENCE For learning the moel parameters we use a truncate variational approximation following closely the algorithm given by Paisley et al. (2011), the main ifference being that we have M separate sets of η, β an p, one for each moality. The above generative process is truncate by setting V (m) = 1, forcing the stic T (m) lengths beyon the truncation level T (m) to be zero, an the resulting factorize approximation is given by Q = M D N m T m=1 =1 n m=1 =1 q(z (m) (m) n m )q(y q(v (m) )q(α m )q(β m )q(µ)q(σ), n m )q(ξ(m) )q(η(m) ) where to simplify notation we assume T m = T m. The algorithm procees by upating each factor in turn while eeping the others fixe, using either graient ascent or analytic solution for maximizing the lower boun of the approximation for each of the terms (see Paisley et al. (2011) for etails). The main ifference in the algorithms comes from upating ξ, since in our case it goes over M sets of topics instea of just one, yet the activities within each set are governe by separate HDPs. We use a iagonal Gaussian factor q(ξ) = N( ξ, iag(ṽ)), where ṽ enotes the variances of the imensions, an use graient ascent for jointly upating the parameters. To simplify notation we use ξ an v to enote the expectation an variance of the factorial istribution. The relevant part of the lower boun is L ξ,v = M β (m) p (m)t ξ (m) (1) m=1 M E[θ (m) ] T E[exp( ξ (m) )] m=1 (ξ µ) T Σ 1 (ξ µ)/2 iag(σ 1 ) T v/2 + log(v) T 1/2. Here Σ 1 couples the separate ξ (m) terms in the partial erivatives as L ξ,v ξ (m) = β(m) p (m) + E[θ (m) ]E[exp( ξ (m) )] (Σ 1 ) m,m (ξ (m) µ (m) ) j m(σ 1 ) m,j (ξ (j) µ (j) ), with (Σ 1 ) i,j enoting a bloc of Σ 1 corresponing to moalities i an j. The inverse of Σ remains constant uring the graient escent, an hence only nees to be evaluate once for every time the factor q(ξ) is upate. We use maximum marginal lielihoo to upate µ an Σ resulting in close form upates µ = 1 D Σ = D =1 ξ D ( (ξ µ)(ξ µ) T + iag(v ) ) /D. =1 3.2 PREDICTION The moel structure is well suite for preiction tass, where the tas is to infer missing moalities for a new ocument given that one of them is observe (e.g. infer the caption given the image content). This is because the correlations between the topic proportions provie a irect lin between the moalities, an the private topics explain away all the variation that is not useful for preictions. Here we present the etails of the preiction for the special case with just one observe moality (j) an one missing moality (i). Given the observe ata we first infer the topic proportions ˆθ (j) an then auxiliary variable ˆξ (j) by maximizing a cost similar to (1), but only using the newly inferre topic proportions of the observe moality an the corresponing part of Σ. As ˆξ comes from a Gaussian istribution we can infer ˆξ (i) given ˆξ (j) with the stanar conitional expectation as ˆξ (i) = µ (i) + Σ i,j Σ 1 j,j (ˆξ (j) µ (j) ) (2) = µ (i) + W(ˆξ (j) µ (j) ). Here W involves the corresponing part of the between-topic covariance matrix Σ as inicate above, an can be seen as a projection matrix transforming the components of one moality to another. Finally, the newly estimate ˆξ (i) for the missing views is converte bac to the expectee topic proportion ˆθ by exponentiation an multiplying with the corresponing stic lengths p (i). 3.3 SHARED AND PRIVATE TOPICS The ey novelty of the moel is its capability to learn both topics that are share an that are private to each moality, without neeing to specify them in avance. Since the way these topics appear is by no means transparent in the above formulation, we will here iscuss the property in more etail. In brief, the istinct nature for the topics comes from an interplay of the correlations between the topics of ifferent moalities an the HDP proceure that turns some of the topics off
5 for each moality. In particular, neither of these properties alone woul be sufficient. As mentione alreay in Section 3, merely having separate ξ (m) rawn from a single Gaussian is not sufficient for fining private topics. At best, the correlation structure can specify that the weights will be inepenent for the moalities. Next we explain how the other ey element of the moel, separate selection of active topics for each moality, is not sufficient alone either. We o that by consiering a special case of the moel that assumes equal ξ = ξ (m) for all views but has separate stic-breaing processes switching some of the topics off for each of the views. We call this alternative moel mmdiln, ue to the fact how it implementes multi-moal LDA of Blei an Joran (2003) with DILN-style component selection. Intuitively, mmdiln moel coul fin private topics simply by setting p (m) to small value for topics that are not neee in that moality. However, it cannot mae correct preictions from one moality to another, an hence fails in achieving one of the primary goals for share-private factorizations. If p (m) is small then the moel has no information for inferring ξ from that view, an hence also all other elements ξ l that correlate with ξ will be incorrect. If ξ was an important topic for the other view, the preictions will be severely biase. Our moel avois this issue by having the separate ξ (m) parameters, leaing to correct across-moality preictions as escribe in the previous section. In the experimental section we will empirically compare the propose moel with mmdiln, emonstrating how mmdiln inee has very poor preictive accuracy espite moeling the training ata almost as well. Hence, even though the structure is in principle sufficient for learning private topics, the moel has no practical value as a share-private factorization. In orer to recognize the nature of each of the topics, we nee to loo at both the covariance Σ between the topic weights an the moality-specific stic weights p (m). Since the topics can be (potentially strongly) correlate both within an across moalities, we can ientify private topics only by searching for topics that o not correlate with any topic that woul be active in any other moality. In the experiments we emonstrate how the topics can be rane accoring to how strongly they are share with another moality, by inspecting the elements of Σ. 4 RELATED WORK In this section we relate the moel to other approaches for moeling multi-moal count ata. 4.1 MULTI-MODAL TOPIC MODELS The multi-moal extension of LDA (mmlda) by Blei an Joran (2003) an its non-parametric version mmhdp by (Yahneno an Hovavar, 2009) assume all moalities to share the same topic proportions, an essentially exten LDA only by having separate ictionaries for each moality an generating the wors for the omains inepenently. For many real worl ata sets the assumption of ientical topic proportions is too strong, an the moel tries to enforce correlations even when they o not exist. While the assumption may help in picing up topics that woul be wea in either moality alone, it maes ientifying the true correlations almost impossible. Such moels fail especially when moeling ata having strong private topics in one moality. Since the topic proportions are share, the topic must be present in other moalities as well an becomes associate with a ictionary that merely replicates the overall istribution of the wors. Such topics are particularly harmful for preiction tass. When the ictionary of a topic matches that of the bacgroun wor istribution, it will be present in every ocument in that moality. For example, when preicting text from images we coul learn to associate politics (a strong topic private to the text moality) with the overall visual wor istribution, resulting in all of the preictions incluing terms from the politics topic. Salomatin et al. (2009) too a step towars our moel with their multi-fiel CTM. It extens CTM by introucing separate ξ (m) for each moality, similarly to our moel. However, as escribe in the previous section the separate topic proportions are not yet sufficient for separating the share topics from private ones. 4.2 CONDITIONAL TOPIC MODELS Lots of recent wor on multi-moal topic moeling framewor has focuse on builing conitional moels, largely for image annotation tas. Corresponence LDA (corrlda) propose simultaneously to mmlda in (Blei an Joran, 2003) is a prominent example, assuming that the image is generate first an the text epens on the image content. Both moalities are assume to share the same topic weights. While such moels are very useful for moeling the conitional relationship, they o not treat the moalities symmetrically as in our moel. Recently Puttivihya et al. (2011) propose an extension of corrlda, replacing the ientical topic istributions with a regression moule from image topics to the textual annotation topics. The ae flexibility results in better preictive performance, but the moel remains a irectional one,
6 in contrast to our moel that generates all moalities with equal importance. For applications treating only two moalities an having a specific tas that maes one of them more important (say, image annotation) the conitional moels often wor well. However, they o not easily generalize to multiple moalities an are not flexible in terms of the eventual application. Other conitional moels focus on conitioning on meta-ata, such as author or lin structure (Mimno an McCallum, 2008; Hennig et al., 2012). Such moels allow integrating ata that are not necessarily in count format, but the same istinction of irectional versus generative applies. However, this family of moels coul be integrate with our solution, incorporating a meta-ata lin into our multi-moal moel. In essence, the choice of whether meta-ata is moele or not is inepenent of the choice of how many count ata moalities the ata has. 4.3 CANONICAL CORRELATIONS As escribe earlier, the moel bears close resemblance to how CCA moels correlations between continuous ata, the similarities being most apparent with the recent re-interpretations of CCA as share-private factorization (Klami an Kasi, 2008; Jia et al., 2010). The technical etails of the solutions are, however, very ifferent as the normalization of topic proportions maes the techniques use for continuous ata not feasible for topic moels. Despite the mismatch of ata types, CCA can be use for moeling count ata as well. The most promising irection woul be to apply ernel-cca, but there are no obvious choices for the ernel function that woul irectly match the analysis of image-text pairs. As one practical remey, (Rasiwasia et al., 2010) combine CCA an LDA irectly by first estimating a separate LDA moel for each moality an then combining the resulting topic proportions with CCA. Our approach oes not rely on two separate analysis steps that o not result in irectly interpretable private topics. 5 EXPERIMENTS AND RESULTS 5.1 DATA AND MEASURES We valiate the moel on real ata collecte from Wiipeia 1. We constructe a ata collection with D = 20, 000 ocuments, each consisting of a single image represente with 5000 SIFT patches an text (the contents of the whole Wiipeia page) represente with a vocabulary of 7500 most frequent terms, after 1 Available from ~jiayq/ stopwor removal. We mae a ranom 50/50-split into test an train ata. To emonstrate the ability of the propose moel to correctly moel the relationships between the two moalities, we evaluate the moel with conitional perplexity of a missing moality for a new sample: ( D P (m) train = exp train log p(x (m) ) ) D train x (m) P (i) (j) test ( = exp D test log p(x (i) x(j) ) D test x (i) ), where x (m) enotes concatenation of N (m) wors. These quantities measure how well the moel can relate the visual content to the textual content, corresponing to the ocument completion tas of Wallach et al. (2009) but compute across moalities. We compare our moel to three alternatives representing various ins of multi-moal topic moels: mmdiln (Section 3.3), mmhdp (Section 4.1) an corrlda (Section 4.2). Both mmdiln an mmhdp are comparable to our moel in maing automatic topic number selection an moeling both moalities symmetrically. Consequently, the experiments will focus on emonstrating the importance of fining the correct factorization into share an private topics. The corrlda is inclue as an example of a conitional moel that gives an alternative approach to solving a similar preiction tas. Note that we nee to learn two separate corrlda moels, one for preicting text from images an one for the other irection, whereas the other moels can o both types of preictions. For corrlda we use 100 topics (the threshol we use for nonparametric moels). 5.2 INFERENCE SPEED First we show that the variational approximation use for inference is efficient. Figure 2 shows how the algorithm converges for both N = 400 an N = ocuments alreay after some tens of iterations. For both experiments we use a maximum of T = 100 topics. The convergence of mmhdp an mmdiln is similar (not shown). 5.3 PREDICTING TEXT FROM IMAGES AND VISE VERSA Figure 3 shows the evaluation for training an test sets for the propose moel an the comparison methos, measure as the perplexity on training ata an the conitional perplexity of images given the text an text given the images. The propose metho, which is more flexible than the alternatives, reaches better
7 Training perplexity text 400 image 400 text image Iterations Figure 2: Training perplexity as function of algorithm iterations. (lower) perplexity on the training an testing ata ue to being able to escribe both variation not share by the other moality without neeing to introuce noise topics. A notable observation is that the baseline methos perform worse at preicting text from images as the amount of training ata increases. This illustrates clearly the funamental problem in moeling multi-moal collections without separate private topics. Since the text ocuments are easier to moel than the images, the alternative moels start to focus more an more on moeling the text when there is large amount of ata. The ominant topics start escribing the text alone, yet they are also active in the image moality but with a topic that oes not contain any information. Given a new image sample, the estimate topic proportions will be arbitrary an hence o not enable meaningful preiction. The propose moel, however, learns to mae those textual topics private to the text moality, while capturing weaer correlations between the two moalities with share topics. The moel still cannot preict textual information not correlate with the image content, but it learns correctly not to even attempt that an manages to mae accurate preictions for the aspects that are correlate. 5.4 SHARED AND PRIVATE TOPICS To illustrate how the HDP-formulation chooses the topics, we visualize the stic parameters p in Figure 4. First, we notice that the last stics have close to zero weight, inicating that the chosen truncation level T = 100 is sufficient. More importantly, we see that the weights for the text an image topics are ifferent (the image topics are more sprea out), motivating the choice of separate weights for the moalities. To further unerstan how the propose moel is able to fin both share an private topics, we explore the nature of the iniviual topics. Since the SIFT vocabulary is not easily interpretable by visual inspection, we illustrate the property for the textual topics. For each textual topic we measure the amount of corre- Stic weights (a) Our moel: text (b) Our moel: image Figure 4: Visualization of stic parameter p of the propose moel for the text moality (a) an the image moality (b) reveals how they are not ientical for the two moalities. Both figures show the weights for two moels learne with 400 an 10, 000 ocuments, revealing how the istribution is learne fairly accurately alreay from a small collection. lation between the other moality by inspecting the correlation structure in Σ, an then ran the topics accoring to this measure. This results in a rane list of the text topics, the first ones being strongly share by the two moalities while the last ones are private to the text moality. More specifically, enoting the separate blocs in the covariance matrix as ( ) Σt,t Σ Σ = t,i, (3) Σ i,t Σ i,i we convert it to a correlation matrix, Ω, threshol small values out (we use a threshol of 0.2) an extract the cross-correlation between textual (rows) an visual topics (columns), to get Ω t,i. Then for each textual topic we efine visual relevance, ρ, as row mean of absolute values of Ω t,i, written as ρ = 1 T (Ω t,i) 1 2. This quantity captures general an rich visual combinations that co-occur with the textual topics, an it is worth noticing how the measure is very general: It allows multiple visual topics to correlate with one textual topic (an vise versa), an inclues both positive an negative correlations that are typically equally relevant (negative correlation can be seen as absence of a visual component) (See Figure 5 for emonstration). The textual topics are rane accoring to ρ in Figure 6. There are a few very strong share topics between text an image moalities, an at the en of the list we have several topics private to the text moality, inicate by zero correlation with the image moality. This matches with the intuition that the full text of a Wiipeia page cannot be mappe to the image content in all cases. Table 1 summarizes the six text topics most strongly correlating with the image moality, as well as six topics that are private 2 We also trie using the maximum element instea of the mean; it results in fairly similar raning.
8 Text train perplexity (a) Image train perplexity Our moel mmdiln mmhdp corrlda (b) Text preictive perplexity Number of training ocuments (c) Image preictive perplexity Number of training ocuments () Figure 3: Training an test perplexities (lower is better) for the two moalities. For training ata we show the perplexity of moeling the text (a) an images (b) separately. For test ata, we show the conitional perplexity of preicting text from images (c) an preicting images from text (), corresponing to the ocument completion tas use for evaluating topic moels. The propose metho outperforms the comparison ones in all respects. The comparison methos mmhdp, mmdiln an corrlda that are not able to extract topics private to either moality are not able to learn goo preictive moels, emonstrate especially by the error increasing as a function of training samples in (c). The image preiction perplexity for mmdiln is outsie the range epicte in (), above 5400 for all training set sizes. Text topics Image topics Figure 5: Illustration of part of cross-correlation between text topics an image topics corresponing to subset of Ω (t,i), where yellow represents positive correlations, an blue represents negative ones. The size of the boxes correspons to the absolute value. to the text moality, revealing very clear interpretations. The most strongly correlating topic covers airplanes, which are nown to be easy to recognize from the images ue to the istinct shapes an bacgroun. The secon topic is about maps that also have clear visual corresponence, an the other strongly correlate topics also cover clearly visual concepts lie builings, cars an railroas. The topics private to the text omain, in turn, are about concepts with no clear visual counterpart: economy, politics, history an research. In summary, the moel has separate the components nicely into share an private ones, an provies aitional interpretability beyon regular multi-moal topic moels. Image relevance Orere text topics Figure 6: Text topics orere accoring to visual relevance ρ. We see that there are a few strongly correlating topics, an that the moel has foun roughly 10 topics that are private to the text omain. Note that such topics may still be important for moeling the whole multi-moal corpus, whereas they o not contribute to the cross-moal information transfer. 6 DISCUSSION Our paper ties together two separate lines of wor for analysis of multi-moal ata. In particular, we create a novel multi-moal topic moel which extens earlier tools for analysis of multi-moal count ata by incorporating elements foun useful in the continuousvalue case. We explaine how learning topics private to each moality is of crucial importance while moeling moalities with potentially wea correlations, an
9 Table 1: Text topics rane accoring to visual relevance, summarize by the wors with highest probability. The topic inices match the raning in Figure 6. The share topics have clear visual counterparts, whereas the private ones o not relate with any in of visual content. Share topics T1 airport flight airlines air international aircraft aviation terminal passengers airline boeing flights airways service airports passenger accient T2 format ms lat m longm latm longs lats launche mi broen mill sol rename ec capture rapis class feet coorinates built lae locate T3 builing house built builings street hall st century tower houses west esigne esign castle south north east sie main square large en site T4 car engine cars moel moels for engines race rear series front racing wheel year river spee vehicles vehicle prouction hp motor rive T5 retrieve album song music vieo release single awars number billboar chart top release mtv songs meia love show u jacson hot albums T6 line railway station rail trains train service lines bus transport services system railways stations built railroa passenger main metro transit Topics private to the text omain T95 presient washington post unite american national states secretary ecember november september times military c enney press security T96 ottoman turish turey osovo armenian war gree serbia bulgarian serbian government borer bulgaria turs forces croatian albanian republic T97 research science evelopment institute university management scientific technology esign worl national engineering wor human international T98 government state national european policy council international states members act union political countries system nations article parliament T99 nuclear weapons anti power protest bomb people protests unite protesters government strie peace states march reactor atomic april test T100 economic trae economy worl prouction inustry oil million growth evelopment government agricultural maret agriculture inustrial emonstrate empirically how such a property can only be obtaine by combining two separate elements: moeling correlations between separate topic weights for each moality, an learning moality-specific inicators switching unnecessary topics off. For implementing these elements we combine state-of-art techniques in topic moels, integrating the DILN istribution (Paisley et al., 2011) into a moel similar to the multi-fiel correlate topic moel of Salomatin et al. (2009), to create an efficient learning algorithm reaily applicable for relatively large ocument collections. Acnowlegements AK an SK were supporte by the COIN Finnish Center of Excellence an the FuNeSoMo exchange project. AK was aitionally supporte by Acaemy of Finlan (ecision number ) an PASCAL2 European Networ of Excellence. References Blei, D., Ng, A. an Joran, M. (2003). Latent Dirichlet allocation. JMLR, 3: Blei, D. an Joran, M. (2003). Moeling annotate ata. In SIGIR. Blei, D. an Lafferty, J. (2007). A correlate topic moel of science. Annals of Applie Sciences, 1: Hennig, H., Stern, D., Herbrich, R. an Graepel,T. (2012). Kernel topic moels. In AISTATS. Jia, Y., Salzmann, M. an Darrell, T. (2010). Factorize latent spaces with structure sparsity. In NIPS 23. Klami, A. an Kasi, S. (2008). Probabilistic approach to etecting epenencies between ata sets. Neurocomputing, 72(1-3): Mimno, D. an McCallum A. (2008). Topic moels conitione on arbitrary features with Dirichletmultinomial regression. In UAI. Paisley, J., Wang C. an Blei, D. (2011). Discrete infinite logistic normal istribution. In AISTATS. Puttivihya, D., Attias, H. an Nagarajan, S. (2011). Topic-regression multi-moal latent Dirichlet allocation for image annotation. In CVPR. Puttivihya, D., Attias, H. an Nagarajan, S. (2009). Inepenent factor topic moels. In ICML. Rasiwasia, N., Pereira, J., Covielho, E., Doyle, G., Lancriet G., Levy, R. an Vasconcelos N. (2010). A new approach to cross-moal multimeia retrieval. In ACM Multimeia. Salomatin, K., Yang, Y. an La, A. (2009). Multifiel correlate topic moeling. In SDM. Teh, Y., Blei, D. an Joran, M. (2006). Hierarchical Dirichlet processes. JASA, 101(476): Virtanen, S., Klami, A. an Kasi, S. (2011). Bayesian CCA via structure sparsity. In ICML. Wallach, H.M., Murray, I., Salahutinov, R. an Mimno, D. (2009). Evaluation methos for topic moels. In ICML. Yahneno, O. an Honavar, V. (2009). Multi-moal hierarchical Dirichlet process moel for preicting image annotation an image-object label corresponence. In SDM.
Topic Modeling: Beyond Bag-of-Words
Hanna M. Wallach Cavenish Laboratory, University of Cambrige, Cambrige CB3 0HE, UK hmw26@cam.ac.u Abstract Some moels of textual corpora employ text generation methos involving n-gram statistics, while
More informationCollapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling
Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example
More informationCollapsed Variational Inference for HDP
Collapse Variational Inference for HDP Yee W. Teh Davi Newman an Max Welling Publishe on NIPS 2007 Discussion le by Iulian Pruteanu Outline Introuction Hierarchical Bayesian moel for LDA Collapse VB inference
More informationSurvey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More information. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.
S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More information6 General properties of an autonomous system of two first order ODE
6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x
More informationA PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationLecture 2: Correlated Topic Model
Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables
More informationSYNCHRONOUS SEQUENTIAL CIRCUITS
CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents
More informationLower bounds on Locality Sensitive Hashing
Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,
More informationIN the evolution of the Internet, there have been
1 Tag-Weighte Topic Moel For Large-scale Semi-Structure Documents Shuangyin Li, Jiefei Li, Guan Huang, Ruiyang Tan, an Rong Pan arxiv:1507.08396v1 [cs.cl] 30 Jul 2015 Abstract To ate, there have been massive
More informationinflow outflow Part I. Regular tasks for MAE598/494 Task 1
MAE 494/598, Fall 2016 Project #1 (Regular tasks = 20 points) Har copy of report is ue at the start of class on the ue ate. The rules on collaboration will be release separately. Please always follow the
More informationCUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, and Tony Wu
CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, an Tony Wu Abstract Popular proucts often have thousans of reviews that contain far too much information for customers to igest. Our goal for the
More informationMulti-View Clustering via Canonical Correlation Analysis
Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms
More informationMath Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors
Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+
More informationSeparation of Variables
Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical
More informationOptimization of Geometries by Energy Minimization
Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.
More informationLinear First-Order Equations
5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationLATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION
The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische
More informationMath 342 Partial Differential Equations «Viktor Grigoryan
Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.
More informationTopic 7: Convergence of Random Variables
Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information
More informationLatent Dirichlet Allocation in Web Spam Filtering
Latent Dirichlet Allocation in Web Spam Filtering István Bíró Jácint Szabó Anrás A. Benczúr Data Mining an Web search Research Group, Informatics Laboratory Computer an Automation Research Institute of
More informationTable of Common Derivatives By David Abraham
Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec
More informationThis module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics
This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.
More informationLDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling
Case Stuy 5: Mixe Membership Moeling LDA Collapse Gibbs Sampler, VariaNonal Inference Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 Task : Mixe
More informationII. First variation of functionals
II. First variation of functionals The erivative of a function being zero is a necessary conition for the etremum of that function in orinary calculus. Let us now tackle the question of the equivalent
More informationLower Bounds for the Smoothed Number of Pareto optimal Solutions
Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.
More informationA Review of Multiple Try MCMC algorithms for Signal Processing
A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications
More informationNecessary and Sufficient Conditions for Sketched Subspace Clustering
Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This
More informationThe Role of Models in Model-Assisted and Model- Dependent Estimation for Domains and Small Areas
The Role of Moels in Moel-Assiste an Moel- Depenent Estimation for Domains an Small Areas Risto Lehtonen University of Helsini Mio Myrsylä University of Pennsylvania Carl-Eri Särnal University of Montreal
More informationensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y
Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay
More informationPart I: Web Structure Mining Chapter 1: Information Retrieval and Web Search
Part I: Web Structure Mining Chapter : Information Retrieval an Web Search The Web Challenges Crawling the Web Inexing an Keywor Search Evaluating Search Quality Similarity Search The Web Challenges Tim
More informationQuantum mechanical approaches to the virial
Quantum mechanical approaches to the virial S.LeBohec Department of Physics an Astronomy, University of Utah, Salt Lae City, UT 84112, USA Date: June 30 th 2015 In this note, we approach the virial from
More information'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21
Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting
More informationDiagonalization of Matrices Dr. E. Jacobs
Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is
More informationAn Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an
More informationKNN Particle Filters for Dynamic Hybrid Bayesian Networks
KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030
More informationA Sketch of Menshikov s Theorem
A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p
More informationHomework 2 Solutions EM, Mixture Models, PCA, Dualitys
Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 http://www.cs.cmu.eu/~bapoczos/classes/ml075_05fall/ OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture
More informationSimilarity Measures for Categorical Data A Comparative Study. Technical Report
Similarity Measures for Categorical Data A Comparative Stuy Technical Report Department of Computer Science an Engineering University of Minnesota 4-92 EECS Builing 200 Union Street SE Minneapolis, MN
More informationQuantum Mechanics in Three Dimensions
Physics 342 Lecture 20 Quantum Mechanics in Three Dimensions Lecture 20 Physics 342 Quantum Mechanics I Monay, March 24th, 2008 We begin our spherical solutions with the simplest possible case zero potential.
More informationLogarithmic spurious regressions
Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate
More informationCalculus and optimization
Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function
More informationCONTROL CHARTS FOR VARIABLES
UNIT CONTOL CHATS FO VAIABLES Structure.1 Introuction Objectives. Control Chart Technique.3 Control Charts for Variables.4 Control Chart for Mean(-Chart).5 ange Chart (-Chart).6 Stanar Deviation Chart
More informationAPPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France
APPROXIMAE SOLUION FOR RANSIEN HEA RANSFER IN SAIC URBULEN HE II B. Bauouy CEA/Saclay, DSM/DAPNIA/SCM 91191 Gif-sur-Yvette Ceex, France ABSRAC Analytical solution in one imension of the heat iffusion equation
More informationRobust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k
A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine
More informationSchrödinger s equation.
Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of
More informationChapter 6: Energy-Momentum Tensors
49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.
More informationRobust Low Rank Kernel Embeddings of Multivariate Distributions
Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions
More informationMulti-View Clustering via Canonical Correlation Analysis
Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu
More informationLagrangian and Hamiltonian Mechanics
Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical
More informationCascaded redundancy reduction
Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,
More information05 The Continuum Limit and the Wave Equation
Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,
More informationAll s Well That Ends Well: Supplementary Proofs
All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee
More informationSensors & Transducers 2015 by IFSA Publishing, S. L.
Sensors & Transucers, Vol. 184, Issue 1, January 15, pp. 53-59 Sensors & Transucers 15 by IFSA Publishing, S. L. http://www.sensorsportal.com Non-invasive an Locally Resolve Measurement of Soun Velocity
More informationMulti-View Clustering via Canonical Correlation Analysis
Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in
More informationJUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson
JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises
More informationMulti-View Clustering via Canonical Correlation Analysis
Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu
More informationIntroduction to Machine Learning
How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression
More informationConservation Laws. Chapter Conservation of Energy
20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action
More informationA simplified macroscopic urban traffic network model for model-based predictive control
Delft University of Technology Delft Center for Systems an Control Technical report 9-28 A simplifie macroscopic urban traffic network moel for moel-base preictive control S. Lin, B. De Schutter, Y. Xi,
More informationGaussian processes with monotonicity information
Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process
More informationThe Exact Form and General Integrating Factors
7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily
More informationEstimating Causal Direction and Confounding Of Two Discrete Variables
Estimating Causal Direction an Confouning Of Two Discrete Variables This inspire further work on the so calle aitive noise moels. Hoyer et al. (2009) extene Shimizu s ientifiaarxiv:1611.01504v1 [stat.ml]
More informationA Novel Decoupled Iterative Method for Deep-Submicron MOSFET RF Circuit Simulation
A Novel ecouple Iterative Metho for eep-submicron MOSFET RF Circuit Simulation CHUAN-SHENG WANG an YIMING LI epartment of Mathematics, National Tsing Hua University, National Nano evice Laboratories, an
More informationPredictive Control of a Laboratory Time Delay Process Experiment
Print ISSN:3 6; Online ISSN: 367-5357 DOI:0478/itc-03-0005 Preictive Control of a aboratory ime Delay Process Experiment S Enev Key Wors: Moel preictive control; time elay process; experimental results
More informationFast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine
Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu
More informationTime-of-Arrival Estimation in Non-Line-Of-Sight Environments
2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor
More informationu!i = a T u = 0. Then S satisfies
Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace
More informationCollapsed Variational Inference for LDA
Collapse Variational Inference for LDA BT Thomas Yeo LDA We shall follow the same notation as Blei et al. 2003. In other wors, we consier full LDA moel with hyperparameters α anη onβ anθ respectiely, whereθparameterizes
More informationThe Principle of Least Action
Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of
More informationHarmonic Modelling of Thyristor Bridges using a Simplified Time Domain Method
1 Harmonic Moelling of Thyristor Briges using a Simplifie Time Domain Metho P. W. Lehn, Senior Member IEEE, an G. Ebner Abstract The paper presents time omain methos for harmonic analysis of a 6-pulse
More informationOptimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations
Optimize Schwarz Methos with the Yin-Yang Gri for Shallow Water Equations Abessama Qaouri Recherche en prévision numérique, Atmospheric Science an Technology Directorate, Environment Canaa, Dorval, Québec,
More informationState observers and recursive filters in classical feedback control theory
State observers an recursive filters in classical feeback control theory State-feeback control example: secon-orer system Consier the riven secon-orer system q q q u x q x q x x x x Here u coul represent
More informationBalancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling
Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January
More informationOn the Surprising Behavior of Distance Metrics in High Dimensional Space
On the Surprising Behavior of Distance Metrics in High Dimensional Space Charu C. Aggarwal, Alexaner Hinneburg 2, an Daniel A. Keim 2 IBM T. J. Watson Research Center Yortown Heights, NY 0598, USA. charu@watson.ibm.com
More informationAnalytic Scaling Formulas for Crossed Laser Acceleration in Vacuum
October 6, 4 ARDB Note Analytic Scaling Formulas for Crosse Laser Acceleration in Vacuum Robert J. Noble Stanfor Linear Accelerator Center, Stanfor University 575 San Hill Roa, Menlo Park, California 945
More informationEstimation of the Maximum Domination Value in Multi-Dimensional Data Sets
Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.
More informationInfluence of weight initialization on multilayer perceptron performance
Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -
More informationAnalyzing Tensor Power Method Dynamics in Overcomplete Regime
Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical
More informationConstruction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems
Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu
More informationInverse Theory Course: LTU Kiruna. Day 1
Inverse Theory Course: LTU Kiruna. Day Hugh Pumphrey March 6, 0 Preamble These are the notes for the course Inverse Theory to be taught at LuleåTekniska Universitet, Kiruna in February 00. They are not
More informationTransmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency
Transmission Line Matrix (TLM network analogues of reversible trapping processes Part B: scaling an consistency Donar e Cogan * ANC Eucation, 308-310.A. De Mel Mawatha, Colombo 3, Sri Lanka * onarecogan@gmail.com
More information1 Heisenberg Representation
1 Heisenberg Representation What we have been ealing with so far is calle the Schröinger representation. In this representation, operators are constants an all the time epenence is carrie by the states.
More informationTutorial on Maximum Likelyhood Estimation: Parametric Density Estimation
Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing
More informationA NONLINEAR SOURCE SEPARATION APPROACH FOR THE NICOLSKY-EISENMAN MODEL
6th European Signal Processing Conference EUSIPCO 28, Lausanne, Switzerlan, August 25-29, 28, copyright by EURASIP A NONLINEAR SOURCE SEPARATION APPROACH FOR THE NICOLSKY-EISENMAN MODEL Leonaro Tomazeli
More informationCapacity Analysis of MIMO Systems with Unknown Channel State Information
Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,
More informationAgmon Kolmogorov Inequalities on l 2 (Z d )
Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,
More informationConvergence of Random Walks
Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of
More informationHybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion
Hybri Fusion for Biometrics: Combining Score-level an Decision-level Fusion Qian Tao Raymon Velhuis Signals an Systems Group, University of Twente Postbus 217, 7500AE Enschee, the Netherlans {q.tao,r.n.j.velhuis}@ewi.utwente.nl
More informationPolynomial Inclusion Functions
Polynomial Inclusion Functions E. e Weert, E. van Kampen, Q. P. Chu, an J. A. Muler Delft University of Technology, Faculty of Aerospace Engineering, Control an Simulation Division E.eWeert@TUDelft.nl
More informationThe Press-Schechter mass function
The Press-Schechter mass function To state the obvious: It is important to relate our theories to what we can observe. We have looke at linear perturbation theory, an we have consiere a simple moel for
More informationTEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE
TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS Yannick DEVILLE Université Paul Sabatier Laboratoire Acoustique, Métrologie, Instrumentation Bât. 3RB2, 8 Route e Narbonne,
More informationSubspace Estimation from Incomplete Observations: A High-Dimensional Analysis
Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis Chuang Wang, Yonina C. Elar, Fellow, IEEE an Yue M. Lu, Senior Member, IEEE Abstract We present a high-imensional analysis
More informationNumber of wireless sensors needed to detect a wildfire
Number of wireless sensors neee to etect a wilfire Pablo I. Fierens Instituto Tecnológico e Buenos Aires (ITBA) Physics an Mathematics Department Av. Maero 399, Buenos Aires, (C1106ACD) Argentina pfierens@itba.eu.ar
More information