Supplementary Material: Weakly Supervised Learning of Heterogeneous Concepts in Videos

Size: px

Start display at page:

Download "Supplementary Material: Weakly Supervised Learning of Heterogeneous Concepts in Videos"

Laureen Cain
6 years ago
Views:

1 Supplementary Material: Weakly Supervised Learning of Heterogeneous Concepts in Videos Sohil Shah, Kuldeep Kulkarni, Ariit Biswas, Ankit Gandhi, Om Deshmukh, and Larry S. Davis Expectation Constraints In an Bayesian framework, the effective constraints for equations - are defined as an expectation [,] of the original constraints and can be rewritten as, i... M, [ ] E q z i s zi a ξ i s,a, s, a Γ i S = [ E q z i s = [ E q z i a = ] ] ξ i s,, s, Γ i S ξ i,a,, a Γ i S i... M and... N i, [ ] E q z i s =, if s, Γ i and s, a Γ i, a A S [ ] E q z i =, if, a Γ i and s, a Γ i, s S S a where the expectation is taken w.r.t. the posterior distribution in. From one may note that through π a i, the samples of z i a depends on the previously sampled latent coefficients such as z i s. This complicates the applicability of constraint in equation S. However due to the independency assumption, the search space over the family of tractable posterior distribution in simplifies the constraint in equation S-S to, i... M, = ν i s = N i = ν i s νi a ξi s,a, s, a Γ i S6 ξi s,, s, Γ i S ν i a ξi,a,, a Γ i S i... M and... N i, ν i s =, if s, Γ i and s, a Γ i, a A S ν i a =, if, a Γ i and s, a Γ i, s S S

2 Sohil Shah et al. Derivation of Posterior Update Equations Now, note that the constraints in S6-S can be rewritten as hinge loss function and added as part of the obective function in equation. Hence the final formulation is given by, min ν i,τ i, Φ k,σ k KL wy ΨY Θ i= M + C + J Γ i J=,a J Γ i J=s,a N M i i= = N i max, = = N i max, ν i a ν i s νi a e {s,a} s.t. i... M, and... N i, S, S log p X ei Y, Θ wydy + J Γ i J=s, The obective function in eq. S can be rewritten as, M Lν i, τ i, Φ k, σk = L L i C i= = N i max, K a+k s k= H i = ν i s S S where L represent KL-divergence term, L i denote the likelihood term and H is the term corresponding to hinge loss function for ν i. Expanding L i, we get, [ ] L i E w log px s i Y, Θ + log pxa i Y, Θ S = xs i T x si E w [z i. As ]x s i + E w [z i. Us z i T. ] σ ns Ds logπσ ns xa i T x ai E w [z i. Aa ]x a i + E w [z i. Ua z i T. ] σna Da logπσna S where U = E w [A A T ] is K max K max matrix, U = Φ.Φ T k.; E w [z i k νi Φ k. x i ; and E w [z i n. U z i T n. ] = <k For KL-divergence term, we get KL KL wz ΨZ Θ + KL ν i n νi nk U + k wy ΨY Θ wa s ΨA s Θ ν i nk D σk + Φ k.φ T k. + KL = KL. A ]x i = S wv Ψv Θ wa a ΨA a Θ +, where

3 Supplementary: Weakly Supervised Learning of Heterogeneous Concepts in Videos the individual terms are, KL M i= wv Ψv Θ k= τ i k = i i αψτ k Ψτ k + τ i k + τ i k i i Γ τ k Γ τ k log Γ τ i k + τ i k K max log α KL wz ΨZ Θ = M K max K max Ψτ i i k Ψτ k + τ i k i= = k= +ν i log νi ν i + νi = KL wa ΨA Θ = log νi K max k= D σ k + Φ kφ T k σ A i i Ψτ k Ψτ k + τ i k νi E w[log S6 k v i ] = S D + log σ k σa S where Ψ. is the digamma function. As shown for original IBP in [], the term E w [log k = vi ] is approximated by its lower bound, E w [log k v i ] = k m= m= k q km Ψτ i m + n=m m= k n=m+ q kn k k q kn Ψτ i m + τ i m + Hq k. Ψτ i m S = L k where the variational parameter q k. = q k... q kk is k-point probability mass function and Hq k. denotes entropy of q k.. The tightest upper bound is obtained by setting, q km = Z k exp Ψτ i m m + n= Ψτ i n m n= Ψτ i n + τ i n where Z k is the normalization factor to enable q k. to be a distribution. On replacing the term E w [log k = vi ] with its lower bound L k, we have an upper bound for KL wy ΨY Θ. On substituting equation S-S in S, the optimum value for parameters of mean-field variational approximate posterior distribution are obtained by setting the derivative of S w.r.t. those parameters to zero and simultaneously solving for

4 Sohil Shah et al. all parameters using KKT conditions. We derive the following equations which are iteratively solved, σke = σae + M σne ν i, e {s, a} S i= = Φ e k = M σne ν i x e i ν i l Φe l σke, e {s, a} S i= = l:l k m N i S τ i k = α + τ i m=k = ν i m + k = + N i m=k m=k+ ν i m = q i mk ν i m = s=k+ q i ms S Above equations are somewhat similar to those given by variational approximation on IBP []. The update equation for ν differs completely and it is given by, ν i = ζ i = L i k + e ζi k t= Ψτ i t Ψτ i t + τ i t L k σna D a σka + Φ a kφ at k + σns + σ na Φ a k + C J Γ i J=s,k x i l k ν i l Φa l T Φ s k σ ns x i + C J Γ i J=k,a D s σks + Φ s kφ st k l k ν i l Φs l T I { Ni ν i l= νi lk νi la <} a I { Ni ν i l= νi ls νi lk <} s + CI{ } Ni l= νi lk <,k Ka+Ks S S where L i k and I is an indicator variable. L i k indicates whether an entity action / subect k is part of i th video label set Γ i or not. This inturn enforces ν = for all ν satisfying eq. S and eq. S. The hyperparameter σn and σa can be set apriori or estimated from the data. The empirical estimation can be easily derived by maximizing the expected log-likelihood, which is similar to maximization step of EM algorithm. The closed form solution is given by, σ A = k= D σ k + Φ kφ T k K max D S6

5 Supplementary: Weakly Supervised Learning of Heterogeneous Concepts in Videos σ n = M i= Ni = x i T x i E w [z i. A ]x i + E w [z i. U z i. T ] M i= N id S The final algorithm is summarized in Algorithm in the paper. Additional Experimental Results In this section we share some of the results which provides additional insights into experiments. Casablanca: The person class confusion matrix is shown in Figure. It exhibits that our approach learns each person appearance model with high accuracy and it can learn from as less as weakly annotated samples. 6 6 BG 6 6 BG 6 6 Fig.. Person class confusion matrix. BG denotes the background class which can represent any unknown face. AD: Some of the additional qualitative results. In Figure, red boxes represents generated proposals, green boxes represents the selected proposals using WSC-SIIBP algorithm and magenta boxes represents the groundtruth annotation. In case of overlapping boxes proposals, only the last plotted rectangular box is visible. Boxes were plotted in the following order: red first, magenta, green last. Additionally, we have attached videos alongside this supplementary material, depicting the generated proposals and their automatic selection.

{human} {human}, {bird, climbing} {dog, walking},

. Qualitative results of weakly supervised

References. Zhu, J., Chen, N., Xing, E.P.

6 6 Sohil Shah et al. {ball, rolling},{dog, running} {baby, walking}, {human} {human}, {bird, climbing} {dog, walking}, {human, walking}, {car} {bird, eating},{cat} Fig.. Qualitative results of weakly supervised concept localization on AD dataset using WSCSIIBP algorithm. Tags are weak paired label input for the video. References. Zhu, J., Chen, N., Xing, E.P.: Bayesian inference with posterior regularization and applications to infinite latent svms. The Journal of Machine Learning Research

7 Supplementary: Weakly Supervised Learning of Heterogeneous Concepts in Videos. Ganchev, K., Graça, J., Gillenwater, J., Taskar, B.: Posterior regularization for structured latent variable models. The Journal of Machine Learning Research. Doshi, F., Miller, K., Gael, J.V., Teh, Y.W.: Variational inference for the indian buffet process. In: International Conference on Artificial Intelligence and Statistics.,

Posterior Regularization

Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods