Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Experiment presentation for CS3710:Visual Recognition Presenter: Zitao Liu University of Pittsburgh ztliu@cs.pitt.edu March 3, 2015 Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 1 / 17

Goal Goal: Discover connections. Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 2 / 17

Goal Goal: Discover connections between recurring mid-level visual elements. Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 3 / 17

Goal Goal: Discover connections between recurring mid-level visual elements in historic (temporal) and geographic (spatial) image collections. Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 4 / 17

Goal Goal: Discover connections between recurring mid-level visual elements in historic (temporal) and geographic (spatial) image collections, and attempts to capture the underlying visual style. Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 5 / 17

Visual Style Visual style: Appearance variations of the same visual element due to change in time or location. Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 6 / 17

The Proposed Method Three steps: 1 Mining style-sensitive visual elements 2 Establishing correspondences 3 Training style-aware regression models Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 7 / 17

Experiment Data: 1 Three datasets are used in the paper and one dataset (CarDb) can be downloaded from the author s website. 2 CarDb: 2.96GB with 13,473 photos. (10130 for training and 3343 for testing) 3 CarDb contains cars made in 1920-1999 crawled from www.cardatabase.net Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 8 / 17

Experiment External packages: 1 VOC: extracting features. 2 LibSVM: making classification and regression. Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 9 / 17

Experiment Preprocessing: CarDb 1920 1930 1940 1950 1960 1970 1980 1990 8 decades in total. Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 10 / 17

Experiment - Step1 Goal: find visual elements whose appearance correlates with the date (label). What code does: 1 Select query images from each category. 2 Random sample patches with various scales and locations from each query image. 3 Each patch is represented by a histogram of gradients (HOG). Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 11 / 17

Experiment - Step1 Goal: find visual elements whose appearance correlates with the date (label). What code does: 1 Find top N(N = 50) nearest neighbor patches in the database. foreach query decade image subsets(querydec) foreach query image in querydec foreach match decade image subsets(matchdec) foreach image in matchdec(i) foreach detector in image I More than 10K images. Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 12 / 17

Experiment - Step1 Goal: find visual elements whose appearance correlates with the date (label). What code does: 1 For each cluster, compute the temporal distribution of labels. (compute entropy based on year distribution) 2 Find good clusters with peaky distributions 3 Rank good clusters and select top M (M = 80) as the discovered style-sensitive visual elements. Peaky examples: 1920 1, 1930 1, 1930 6, 1940 2, 1960 1, 1960 2, 1960 6, 1960 12, etc. Flat examples: 1930 9, 1930 10, 1940 6, etc. Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 13 / 17

Experiment - Step2 Goal: model the change in style of the same visual element over the entire label space. What code does: 1 Train a linear SVM with initial cluster images. Negative data are sampled from random Flickr images. 2 Run the learned SVM on a new subset of the data that slightly broader range in label space. 3 Keep the most confident prediction example and continue. Cross-validation is used in each step of the incremental revision. Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 14 / 17

Experiment - Step2 Good positive training examples. 1928 1925 1929 1927 1928 1940 1939 1940 1937 1938 1941 1941 1941 1941 1947 1959 1960 1955 1951 1953 1969 1970 1967 1965 1967 1972 1977 1979 1972 1977 1984 1989 1981 1989 1987 1991 1995 1991 1992 1991 Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 15 / 17

Experiment - Step3 Goal: Prediction. What code does: 1 Train a SVM with RBF kernels on the correspondences obtained from Step 2. 2 Each training example is weighted by a transformed detection score. Keep in mind: we have 8 decades; 10 clusters for each decades; 50 neighbors in each cluster. Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 16 / 17

Thank you! Q & A Zitao Liu (University of Pittsburgh) CS3710:Visual Recognition March 3, 2015 17 / 17