Real-Time Computerized Annotation of Pictures

Size: px

Start display at page:

Download "Real-Time Computerized Annotation of Pictures"

Ada Manning
5 years ago
Views:

1 Real-Time Computerized Annotation of Pictures Jia Li James Z. Wang The Pennsylvania State University

2 How Visible Are Web Images? Keukenhof photos

3 ALIPR: Automatic Linguistic Indexing for Pictures - Real Time plant, flower, landscape, people, tulip tree, plant, people, water, garden flower, plant,lake, rural, building animal, people, wild-life, dog, landscape

4 Overview System The ALIPR system: real-time automatic annotation of pictures Human evaluation on Web images Learning methodology D2-clustering Bags of weighted vectors discrete distributions with variable supports Mixture modeling via mapping to conjectural space

5 Architecture for Training

6 Architecture for Annotation

7 Image Knowledge Base

8 Six Hundred Semantic Categories Corel image database 80 images per category. Each category is described by several words: autumn, tree, landscape, lake. A total of 332 distinct words.

9 Feature Extraction Color components: LUV Texture features: wavelet coefficients

10 Feature Extraction Color components: LUV Texture features: wavelet coefficients

11 Feature Extraction Color components: LUV Texture features: wavelet coefficients

12 Region Segmentation and Signature Formulation

13 Region Segmentation and Signature Formulation An image signature resides in Ω = Ω 1 Ω 2. Color distribution: β i,1 Ω 1. Texture distribution: β i,2 Ω 2. β i,j = {(v (1) i,j, p (1) i,j ),..., (v (m i,j ) i,j, p (m i,j ) i,j )}.

14 Mallows Distance between Distributions Let the two discrete distributions be γ i = {(z (1) i, q (1) i ), (z (2) i, q (2) i ),..., (z (m i ) i, q (m i ) i )}, i = 1, 2 The Mallows distance D(γ 1, γ 2 ) is defined by m 1 D 2 (γ 1, γ 2 ) = min {w i,j } m 2 i=1 j=1 w i,j z (i) 1 z (j) 2 2 subject to m 2 j=1 w i,j = q (i) 1, i = 1,..., m 1, m1 i=1 w i,j = q (j) 2, j = 1,..., m 2, w i,j 0, i = 1,..., m 1, j = 1,..., m 2. Solved by linear programming.

15 Profiling Image Concepts via Mixture Models

16 Mixture Modeling via Local Mapping Mixture modeling for space Ω Carve Ω into cells by clustering. Map each cell to an Euclidean space, preserving pairwise distances. Model the mapped points by Gaussian. Images: a grid of feature vectors Gaussian mixture 2-D HMM

17 D2-Clustering An image set B = {β i : β i Ω, i = 1,..., n}, Ω = Ω 1 Ω 2 Ω d Distance between arrays of discrete distributions: Optimize A set of prototypes A = {α i : α i Ω, i = 1,..., m}. Prototype assignment function c(i) {1, 2,..., m}, i = 1,..., n. D(β i, β j ) d D 2 (β i,l, β j,l ) l=1

18 D2-Clustering Optimization Criterion L(B, A, c ) = min min A c n i=1 D(β i, α c(i) ) K-means D2-Clustering (vectors) (bags of weighted vectors)

19 D2-Clustering Algorithm 1. For every image i, set c(i) = arg min j=1,..., m D(β i, α j ). 2. Let C j = {i : c(i) = j}, j = 1,..., m. That is, C j contains indices of images assigned to prototype j. For each prototype j, set α j = arg min α Ω i C j D(βi, α) challenging

20 Update of Prototypes 1. For every image i, set c(i) = arg min j=1,..., m D(βi, α j ). 2. Let C η = {i : c(i) = η}, η = 1,..., m. Update each α η,l, η = 1,..., m, l = 1,..., d, individually by the following steps. Denote α η,l = {(z (1) η,l, q(1) ), (z(2) η,l η,l, q(2) η,l ),..., ) (z(m, q (m η,l ) )}. η,l η,l η,l 2.1 Fix z (k) η,l, k = 1,..., m η,l. Update q(k) η,l, w (i) k,j, i Cη, k = 1,..., m η,l, j = 1,..., m i,l by solving the linear programming problem: min (k) q i Cη min m w (i) η,l m i,l k=1 j=1 w (i) k,j η,l k,j q (k) 0, k = 1,..., m η,l η,l ; m i,l j=1 w (i) = q (k) k,j z (k) v (j) 2, subject to m η,l η,l i,l k=1 q(k) = 1; η,l η,l, i Cη, k = 1,..., m η,l ; m η,l k=1 w (i) k,j i C η, j = 1,..., m i,l ; w (i) 0, i C k,j η, k = 1,..., m η,l, j = 1,..., m i,l. 2.2 Fix q (k) η,l, w (i) k,j, i Cη, 1 k m η,l, 1 j m i,l. Update z (k) η,l, k = 1,..., m η,l by m i,l z (k) i Cη j=1 w (i) k,j v (j) i,l = η,l m i,l i Cη j=1 w (i). k,j = p (j) i,l, 2.3 Compute m η,l k=1 m i,l j=1 w (i) k,j is below a threshold, go to Step 3; otherwise, go to Step 1. z (k) v (j) 2. If the rate of decrease from the previous iteration η,l i,l 3. Compute L(B, A, c). If the rate of decrease from the previous iteration is below a threshold, stop; otherwise, go back to Step 1.

21 Mixture Modeling Modeling Model by Gaussian: spherical covariance, prototype being the mean. X N(µ, σ 2 I ), X R k X µ 2 (γ : 2σ 2, k 2 ).

22 Fitting Gamma Density The pdf of (γ : b, s) f (u) = ( u b )s 1 e u/b bγ(s), u 0 ML estimation of (γ : b, s) { log ŝ ψ(ŝ) = log [ū/( n i=1 u i) 1/n] ˆb = ū/ŝ di-gamma function: d log Γ(s) ψ(s) = ds, s > 0. (γ : b = 86.34, s = 3.5)

23 Mixture Model The mixture model M prototypes: {α 1, α 2,..., α M } Ω. The mixture density: φ(β) = M 1 ω η ( ) 2s e D(β,αη) b πb η=1

24 Annotation

25 Word Probabilities Total word list: W = {w 1, w 2,..., w K }. Semantic categories containing word w i : C(w i ). Model of category m: M m, m = 1,..., M. Prior on categories: ρ m (set uniform). Category prob. given signature β p m (β) = ρ mf (β M m ) M l=1 ρ lf (β M l ) Word probability q(β, w i ) = m:m C(w i ) p m (β).

26 Word Probabilities Total word list: W = {w 1, w 2,..., w K }. Semantic categories containing word w i : C(w i ). Model of category m: M m, m = 1,..., M. Prior on categories: ρ m (set uniform). Category prob. given signature β p m (β) = ρ mf (β M m ) M l=1 ρ lf (β M l ) Word probability q(β, w i ) = m:m C(w i ) p m (β).

27 Word Probabilities Total word list: W = {w 1, w 2,..., w K }. Semantic categories containing word w i : C(w i ). Model of category m: M m, m = 1,..., M. Prior on categories: ρ m (set uniform). Category prob. given signature β p m (β) = ρ mf (β M m ) M l=1 ρ lf (β M l ) Word probability q(β, w i ) = m:m C(w i ) p m (β).

28 ALIPR Online Demo:

29 ALIPR Online Demo flickr.com images

30 Human Evaluation on flickr.com Images Manual evaluation on 5, 411 flickr.com images. Accuracy of the first word: 51.17%.

31 Human Evaluation on flickr.com Images Coverage rate: percentage of images correctly annotated by at least one word. Top 4 words: > 80%. Top 7 words: 91.37%. Top 15 words: 98.13%.

32 Human Evaluation on flickr.com Images Annotate using top 15 words. # correct: 4.1 on average

33 ALIPR Interactive

34 ALIPR Interactive

35 ALIPR Interactive

36 Speed Training: 109 seconds on ave. 80 images per category 2.4 GHz AMD processor Annotation: 1.4 seconds on ave. for example images 3.0 GHz Intel processor Convert from JPEG to raw format; extract image signature; find annotation words.

37 Conclusions System The ALIPR system: real-time automatic annotation of pictures Human evaluation on web images Learning methodology D2-clustering Generalized k-means to bags of weighted vectors Mixture modeling via mapping to conjectural space Human evaluation on 5, 400+ Web images has demonstrated promising results. Future work: bridge with retrieval, incremental learning, improve modeling, Web applications... ALIPR your pictures:

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide