38 1 Vol. 38, No ACTA AUTOMATICA SINICA January, Bag-of-phrases.. Image Representation Using Bag-of-phrases

Size: px

Start display at page:

Download "38 1 Vol. 38, No ACTA AUTOMATICA SINICA January, Bag-of-phrases.. Image Representation Using Bag-of-phrases"

Primrose Martin
5 years ago
Views:

1 38 1 Vol. 38, No ACTA AUTOMATICA SINICA January, 2012 Bag-of-phrases 1, , Bag-of-words,,, Bag-of-words, Bag-of-phrases, Bag-of-words DOI,, Bag-of-words, Bag-of-phrases, SIFT /SP.J Image Representation Using Bag-of-phrases ZHANG Lin-Bo 1, 2 WANG Chun-Heng 1 XIAO Bai-Hua 1 SHAO Yun-Xue 1 Abstract Bag-of-words representation, with which an image is represented as a histogram of the numbers of occurrences of particular visual words, has demonstrated impressive levels of performance in the past few years. However, the relative position information between the visual words are almost entirely ignored. In this paper, the potential strength of this relative position information is investigated and a new kind of representation named Bag-of-phrases is proposed. The effectiveness of this strategy is validated on two benchmark databases. The classification results demonstrate that our Bag-of-phrases strategy can achieve better results compared to Bag-of-words method. Key words descriptor Image representation, spatial layout, Bag-of-words, Bag-of-phrases, scale-invariant feature transform (SIFT),,,, :,, (, ),, Manuscript received July 28, 2011; accepted October 9, 2011 ( , ) Supported by National Natural Science Foundation of China ( , ) Recommended by Associate Editor LIU Yi-Jun 1. ( ) Key Laboratory of Complex Systems and Intelligence Science, Institute of Automation, Chinese Academy of Sciences, Beijing China Academy of Transportation Sciences, Beijing , :, : LBP (Local binary pattern) [1] HOG (Histogram of oriented gradients) [2] [3],,,,,,, Neisser [4] : (Preattentive stage) (Attentive stage)., (pop-out)

2 1 : Bag-of-phrases 47 ;,,, : [5]. [6 7]., ( ),, : [8 9] (Part-structure) [10 11] [12] [13] [14], Bag-of-words [8] Bagof-words, Bag-of-words Zhu [15] Csurka [8] (Visual words) ; Zhu [15] (Keyblock) Bag-of-words :, ;,, (K-, Mean-shift [16] ), ; ( (Soft assignment) [17] ) ;,, Bag-of-words,, 1 Fig. 1 1 Bag-of-words Illustration of Bag-of-words model Bag-of-words, ; Bag-of-words,, Lazebnik [18] (Spatial pyramid) Bag-of-words : 1 Bag-of-words ; 2 Bag-of-words Bag-of-words, Bag-of-words ; 3 Bagof-phrases ; 4 ; 1 Bag-of-words,, Grauman (Spatial pyramid matching, SPM) [19]. (Histogram pyramids),, Lazebnik [18] Bag-of-words Bag-of-words, Bag-of-words,, 2 Bag-of-words,,,, Bag-of-words,, 4 4, 2, L = 1, 2 2

Bag-of-words Bag-of-words Bag-of-words,,,, Bag-of-words [13], (High-level), Bag-of-words ; Bag-of-words, Bag-of-words Bag-of-words?

3 48 38 Fig. 2 2 Bag-of-words Illustration of spatial pyramid Bag-of-words models,,,,,, 3 (a) 3 (b), 3 (a) 3 (b), 3 (c) Bag-of-words, 3 (c) 3 (b) 2 2 Bag-of-words Bag-of-words Bag-of-words,,,, Bag-of-words [13], (High-level), Bag-of-words ; Bag-of-words, Bag-of-words Bag-of-words? [4], Bag-of-words? [13],,,,, Bag-ofwords, Bag-of-words Bag-of-words Bag-of-words, Bag-of-words, 2 3 Fig. 3 Bag-of-words Spatial pyramid Bag-of-words vectors of different image contents Bag-of-words, Bag-of-words : 1) ; 2), Bag-of-words, (Document) (Word),,,,,,,,.,,, 4, ( );, Bag-

4 1 : Bag-of-phrases 49 of-words ( ), Bag-of-words Fig. 4 4 The relationship between local features in image and words in document, ( ) (, ).,,, (, ), (, ),,, ( ), 3 Bag-of-phrases (Visual words), (Visual phrase)., Bag-ofwords Bag-of-phrases 3.1 Bag-of-phrases : 1), I i N i, P i = {p i1, p i2,, p ini }; 2) p ij, p ij, f ij. : (Visual word) f ij p ij 3) ( f ij ), s ij ; 4) p ij f ij s ij,, v ij ; 5), Bag-of-words (Code phrase), ; 6) Bag-of-phrases Visual phrase Code phrases, Code phrase, Bag-ofphrases, Bag-of-phrases,, Bag-of-words 3.2, Shape-context [20] I i ( 5 (a) ), 1) I i p ij, R, O ( 5 (b) )., O,, ;,, ;, 2),

5 50 38, s ij. 3), s ij : p ij, 4),, s ij : t th 1/δ t., s ij p ij (a) (a) Local features on original image (b) (4, 12 ) (b) Spatial layout of neighbor visual words around the center of concentric circles in 4 radius bins and Fig orientation bins Modeling spatial layout of visual words,,,,,,, p ij, f ij s ij v ij.,,,, Bagof-words Bag-of-words (Code word), Bag-of-words (, K- Mean-shift ) Bag-of-phrases, Bag-of-phrases, Bag-of-words, :, [17] 4 Bag-of-phrases,, Caltech-101 PASCAL Visual object challenge (VOC) 2006 : 1), [21]., N=1 000; 2) p ij f ij SIFT [22] ; 3) 3.2 p ij, M O 5 12, 5 : 4, 8, 14, 22, 32; 4) s ij δ t, [0, 255], SIFT (Scaleinvariant feature transform) δ t δ 1 < δ 2 < δ 3 < δ 4 < δ 5. δ t = t,, ; 5), K-, Csurka [8]

6 1 : Bag-of-phrases 51 Bag-of-words ; 6) Bag-of-phrases, Lib- SVM SVM k(x 1, x 2 ) = exp{ d χ 2(x 1, x 2 )/σ},, σ 100. d χ 2(x 1, x 2 ) : d χ 2(x 1, x 2 ) = 1 2 Q q=1 [x 1 (q) x 2 (q)] 2 x 1 (q) + x 2 (q), x 1 (q), x 2 (q) Q- x 1 x 2 q 4.1 Caltech-101 Caltech , [23]., 30,, = 3 030,, Bag-of-phrases 1 (Support vector machine, SVM), 101, : ζ 1, ζ 2,, ζ 101. Iτ, 101, p 1 τ, p 2 τ,, p 101 τ., I τ l τ, : l τ = arg t 101 max t=1 pt τ l τ I τ l τ, I τ ;, 1 Caltech-101, Bag-of-words [8] Bag-of-words [18]. 1, Bag-of-phrases Bag-of-words, Bag-of-words Caltech-101, Bag-of-words PASCAL VOC 2006, 1 Caltech-101 (%) Table 1 Comparison of precisions on Caltech-101 dataset (%) (%) Bag-of-words, K = ± Bag-of-words, K = ± Bag-of-phrases, K = ± Bag-of-words, K = ± Bag-of-words, K = ± Bag-of-phrases, K = ± PASCAL VOC 2006 PASCAL VOC (Visual object challenge 2006) PASCAL VOC 2010 : 1) PASCAL VOC ) PAS- CAL VOC 2006, Bag-of-words Bagof-words, PASCAL VOC 2010,, Bag-of-words ;, 3) PASCAL VOC , PASCAL VOC , PASCAL VOC 2006, PASCAL VOC 2010,,,, PAS- CAL VOC 2006 PASCAL VOC , 10 : (Bicycle) (Bus) (Car) (Cat) (Cow) (Dog) (Horse) (Motorbike) (Person) (Sheep). PASCAL VOC 2006, 2 618, 2 686, INRIA Larlus [24] Bag-of-words, ;, INRIA Larlus

7 K = , Bagof-words PASCAL VOC 2006 Bag-of-words, PASCAL VOC 2006, ROC (Receiver operating characteristic), ROC, 6, Bagof-words 5 Spatial bag-of-words. 6 Fig. 6 The relationship between words in document and local features in image 6, Bag-of-phrases 1 000, INRIA Larlus [24],, Bag-of-words Bag-of-words, ;, Bag-of-words Bag-of-words,, K = K = 1 000, 4.3, Bag-ofphrases,,, Bag-ofphrases, Bag-of-words,, ( p ij ), p ij ( ), p ij, p ij,,,,,, Bag-of-phrases : ( ),, 4,,, :,, ;,,,, :,,, Bag-ofwords ;,, Bag-of-phrases ; (Visual sentences), Bag-ofsentences,,,,,, [25 26] Visual phrase, [25] (Visual words) Visual phrase, Visual phrase, [25], [26] Visual word, Visual phrase,

8 1 : Bag-of-phrases 53 5 Bag-of-words Bag-of-words,, ;, Bag-of-words Bag-of-phrases, Bag-of-phrases, : ( ), ;,, References 1 Ojala T, Pietikainen M, Maenpaa T. Multiresolution grayscale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): Dalal N, Triggs B, Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE, Swain M J, Ballard D H. Color indexing. International Journal of Computer Vision, 1991, 7(1): Neisser U. Visual search. Scientic American, 1964, 210(6): Tuytelaars T, Mikolajczyk K. Local invariant feature detectors: a survey. Foundations and Trends in Computer Graphics and Vision, 2008, 3(3): Mikolajczyk K, Schmid C. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(10): Li J, Allinson N. A comprehensive review of current local features for computer vision. Neurocomputing, 2008, 71(10 12): Csurka G, Dance C R, Fan L, Willamowski J, Bray C. Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision. Prague Czech Republic: ECCV, Yang J C, Yu K, Gong Y H, Huang T. Linear spatial pyramid matching using sparse coding for image classication. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Mimi, USA: IEEE, Fergus R, Perona D, Zisserman A. Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Madison, USA: IEEE, Felzenszwalb P F, Huttenlocher D P. Pictorial structures for object recognition. International Journal of Computer Vision, 2005, 61(1): Shotton J, Blake A, Cipolla R. Multiscale categorical object recognition using contour fragments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 30(7): Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T. Robust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(3): Torralba A, Murphy K P, Freeman W T. Contextual models for object detection using boosted random fields. In: Proceedings of the Neural Information Processing Systems. Vancouver, Canada: NIPS, Zhu L, Rao A B, Zhang A D. Theory of keyblock-based image retrieval. ACM Transactions on Information Systems, 2002, 20(2): Comaniciu D, Meer P. Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(5): Gemert J C, Geusebroek J M, Veenman C J, Smeulders A W M. Kernel codebooks for scene categorization. In: Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Computer Vision and Pattern Recognition. New York, USA: IEEE, Grauman K, Darrell T. The pyramid match kernel: discriminative classication with sets of image features. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. Beijing, China: IEEE, Belongie S, Malik J, Puzicha J. Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(4): Loupias E, Sebe N, Bres S, Jolion J M. Wavelet-based salient points for image retrieval. In: Proceedings of the International Conference on Image Processing. Vancouver, Canada: IEEE, Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2):

54 38 23 Li F F, Fergus F, Perona P. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories.

The PASCAL visual object classes challenge 2006 (VOC2006) results [online], available: http://www.pascalnetwork.org/challenges/voc/voc2006/results.pdf, July 20, 2011 25 Yuan J, Wu Y, Yang M.

1 8 26 Sadeghi M A, Farhadi A. Recognition using visual phrases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE, 2011.

His research interest covers pattern recognition, machine learning, and content based image classification. Corresponding author of this paper.),,, E-mail: chunheng.wang@ia.ac.cn (WANG Chun-Heng Professor at the Institute of Automation, Chinese Academy of Sciences.

9 Li F F, Fergus F, Perona P. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In: Proceedings of the Conference on Computer Vision and Pattern Recognition Workshop. Washington D. C., USA: IEEE, 2004, Everingham M, Zisserman A, Williams C, Gool L V. The PASCAL visual object classes challenge 2006 (VOC2006) results [online], available: July 20, Yuan J, Wu Y, Yang M. Discovery of collocation patterns: from visual words to visual phrases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE, Sadeghi M A, Farhadi A. Recognition using visual phrases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE, , linbo.zhang@ia.ac.cn (ZHANG Lin-Bo Ph. D. candidate at the Institute of Automation, Chinese Academy of Sciences. His research interest covers pattern recognition, machine learning, and content based image classification. Corresponding author of this paper.),,, chunheng.wang@ia.ac.cn (WANG Chun-Heng Professor at the Institute of Automation, Chinese Academy of Sciences. His research interest covers pattern recognition, intelligent systems, image processing, and character recognition, and artificial intelligence.), baihua.xiao@ia.ac.cn (XIAO Bai-Hua Professor at the Institute of Automation, Chinese Academy of Sciences. His research interest covers pattern recognition, intelligent systems, and multimedia information processing and retrieval.), yunxue.shao@ia.ac.cn (SHAO Yun-Xue Ph. D. candidate at the Institute of Automation, Chinese Academy of Sciences. His research interest covers pattern recognition, machine learning, and handwritten Chinese character recognition.)

Wavelet-based Salient Points with Scale Information for Classification

Wavelet-based Salient Points with Scale Information for Classification Alexandra Teynor and Hans Burkhardt Department of Computer Science, Albert-Ludwigs-Universität Freiburg, Germany {teynor, Hans.Burkhardt}@informatik.uni-freiburg.de