Learning Spa+otemporal Graphs of Human Ac+vi+es

Similar documents
Collective Activity Detection using Hinge-loss Markov Random Fields

Modeling Complex Temporal Composition of Actionlets for Activity Prediction

Probabilistic Logic CNF for Reasoning

A Human Behavior Recognition Method Based on Latent Semantic Analysis

Distinguish between different types of scenes. Matching human perception Understanding the environment

Graphical Object Models for Detection and Tracking

Sequential Interval Network for Parsing Complex Structured Activity

Human Action Recognition under Log-Euclidean Riemannian Metric

Global Behaviour Inference using Probabilistic Latent Semantic Analysis

STA 4273H: Statistical Machine Learning

Sound Recognition in Mixtures

Finding Group Interactions in Social Clutter

Human activity recognition in the semantic simplex of elementary actions

arxiv: v1 [cs.ro] 1 Mar 2017

Statistical Methods for NLP

Mining Motion Atoms and Phrases for Complex Action Recognition

Sta$s$cal sequence recogni$on

Juergen Gall. Analyzing Human Behavior in Video Sequences

Statistical Learning Reading Assignments

Predicting Human Activities Using Stochastic Grammar

Deep Learning of Invariant Spatiotemporal Features from Video. Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia

Anomaly Localization in Topic-based Analysis of Surveillance Videos

CASE E : A Hierarchical Event Representation for the Analysis of Videos

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

Activity Recognition using Dynamic Subspace Angles

Tensor Canonical Correlation Analysis for Action Classification

University of Genova - DITEN. Smart Patrolling. video and SIgnal Processing for Telecommunications ISIP40

STA 414/2104: Machine Learning

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018)

Recent Advances in Bayesian Inference Techniques

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Kernel Density Topic Models: Visual Topics Without Visual Words

Statistical Filters for Crowd Image Analysis

CS 7180: Behavioral Modeling and Decision- making in AI

Parsing Video Events with Goal inference and Intent Prediction

Time Series Classification

Singlets. Multi-resolution Motion Singularities for Soccer Video Abstraction. Katy Blanc, Diane Lingrand, Frederic Precioso

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward

Object Detection Grammars

Feedback Loop between High Level Semantics and Low Level Vision

Opportunities and challenges in quantum-enhanced machine learning in near-term quantum computers

Discriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos

Probabilistic Latent Semantic Analysis

Probabilistic Graphical Models

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

Hidden Markov Models

Sum-Product Networks: A New Deep Architecture

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Bayesian networks Lecture 18. David Sontag New York University

COMP90051 Statistical Machine Learning

N-mode Analysis (Tensor Framework) Behrouz Saghafi

Anticipating Visual Representations from Unlabeled Data. Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

UNDERSTANDING EVENTS IN VIDEO GAL LAVEE

Hidden Markov Models

Environmental Sound Classification in Realistic Situations

Intelligent Systems (AI-2)

A Novel Activity Detection Method

Lecture 13: Tracking mo3on features op3cal flow

Semi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University

Dialogue Systems. Statistical NLU component. Representation. A Probabilistic Dialogue System. Task: map a sentence + context to a database query

Intelligent Systems (AI-2)

Conditional Random Field

Shape of Gaussians as Feature Descriptors

Non-parametric Hidden Conditional Random Fields for Action Classification

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Efficient Action Spotting Based on a Spacetime Oriented Structure Representation

Pattern Recognition. Parameter Estimation of Probability Density Functions

Intelligent Systems (AI-2)

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Mathematical Formulation of Our Example

Lecture 15. Probabilistic Models on Graph

Parametric Models Part III: Hidden Markov Models

Anomaly Detection for the CERN Large Hadron Collider injection magnets

Machine Learning Techniques for Computer Vision

Mixture Models and EM

Tensor Methods for Feature Learning

Latent Dirichlet Allocation Introduction/Overview

1/30/2012. Reasoning with uncertainty III

Statistical Methods for NLP

Early Human Actions Detection using BK Sub-triangle Product

Study Notes on the Latent Dirichlet Allocation

Knowledge Compilation and Machine Learning A New Frontier Adnan Darwiche Computer Science Department UCLA with Arthur Choi and Guy Van den Broeck

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

Deriving Uncertainty of Area Estimates from Satellite Imagery using Fuzzy Land-cover Classification

Global Scene Representations. Tilke Judd

Semi-Supervised Learning

ParaGraphE: A Library for Parallel Knowledge Graph Embedding

Facial Expression Recogni1on Using Ac1ve Appearance

Undirected Graphical Models

Tennis player segmentation for semantic behavior analysis

Template-Based Representations. Sargur Srihari

Scheduling Top- Down and Bo4om- Up Processes

Introduction to Graphical Models

Su Liu 1, Alexandros Papakonstantinou 2, Hongjun Wang 1,DemingChen 2

Semi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)

Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models

Temporal Factorization Vs. Spatial Factorization

Lecture 17: Face Recogni2on

Transcription:

Learning Spa+otemporal Graphs of Human Ac+vi+es William Brendel Sinisa Todorovic

Our Goal Long Jump Triple Jump Recognize all occurrences of activities Identify the start and end frames Parse the video and find all subactivities Localize actors and objects involved

Weakly Supervised Setting Weight Lifting Large-Box Lifting In training: > ONLY class labels Domain knowledge of temporal structure: > NOT AVAILABLE

Learning What and How Weak supervision in training Need to learn from training videos: What ac+vity parts are relevant How relevant they are for recogni+on

Prior Work vs. Our Approach Typically, focus only on HOW semantic level model gap features raw video

Prior Work vs. Our Approach Typically... semantic level model features raw video gap semantic level model mid-level features raw video

Prior Work Video Representation Space- +me points Laptev & Schmid 08, Niebles & Fei- Fei 08, S+ll human postures SoaLo 07, Ning & Huang 08, Ac+on templates Yao & Zhu 09, Point tracks Sukthankar & Hebert 10,

Our Features: 2D+t Tubes Allow simpler: - Modeling - Learning (few examples) - Inference Sukthankar & Hebert 07, Gorelick & Irani 08, Pritch & Peleg 08,... We are the first to use 2D+t tubes for building a sta+s+cal model of ac+vi+es

Our Features: 2D+t Tubes Allow simpler: - Modeling - Learning (few examples) - Inference Sukthankar & Hebert 07, Gorelick & Irani 08, Pritch & Peleg 08,... We use 2D+t tubes for building a sta+s+cal genera+ve model of ac+vi+es

Prior Work Activity Representation Graphical models, Grammars - Ivanov & Bobick 00 - Xiang & Gong 06 - Ryoo & Aggawal 09 - Gupta & Davis 09 - Liu & Zhu 09 - Niebles & Fei- Fei 10 - Lan et al. 11 Probabilis+c first- order logic - Tran & Davis 08 - Albanese et al. 10 - Morariu & Davis 11 - Brendel et al. 11...

Approach Input Spa+otemporal Ac+vity Recogni+on Video Graph Model Localiza+on

Blocky Video Segmentation

Activity as a Spatiotemporal Graph Descriptors of nodes and edges: Node descriptors: F - Motion - Object shape Adjacency Matrices: {A i } - Allen temporal relations - Spatial relations - Compositional relations

Activity as Segmentation Graph G = (V, E, "descriptors") = (F, {A 1,..., A n }) node descriptors adjacency matrices of distinct relations between the tubes

Activity Graph Model Probabilistic Graph Mixture model node descriptors mixture weights compositional model adjacency matrices spatial + + * * * temporal

Activity Model An ac+vity instance: G = (F, {A 1,..., A n }) Model adjacency matrices Edge type: i =1, 2,..., n

Activity Model An ac+vity instance: G = (F, {A 1,..., A n }) Model adjacency matrices Edge type: i =1, 2,..., n Model matrix of node descriptors

Inference Input Spa+otemporal Ac+vity Recogni+on Video Graph Model Localiza+on

Inference = Robust Least Squares Goal: For every ac+vity model Es+mate the permuta+on matrix subject to

Learning the Activity Graph Model Training videos Training graphs Graph model

Learning Given K training graphs, Adjacency matrix Node descriptor Edge type: i =1, 2,..., n

Learning Given K training graphs, Adjacency matrix ESTIMATE Node descriptor Model parameters

Learning Given K training graphs, Adjacency matrix ESTIMATE Node descriptor Permutation matrix

Learning = Robust Least Squares Given K Training graphs: Es4mate: and

Learning = Structural EM E-step à expected model structure M-step à matching of the training graphs and model Estimatation of model parametrs Estimation of permutation matrices

Learning Results Correctly learned ac+vity- characteris+c tubes

Recognition and Segmentation Ac+vity handshaking Detected and segmented characteris+c tube

Recognition and Segmentation Ac+vity kicking Detected and segmented characteris+c tube

Classification on UTexas Dataset Human interac+on ac+vi+es [18] Ryoo et al. 10

Conclusion Fast spatiotemporal segmentation New activity representation = graph model Unified learning and inference = Least squares Learning under weak supervision: - WHAT activity parts are relevant and - HOW relevant they are for recognition