Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig

Similar documents
Multimedia Databases. Previous Lecture Video Abstraction Video Abstraction Example 6/20/2013

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Multimedia Databases Video Abstraction Video Abstraction Example Example 1/8/ Video Retrieval Shot Detection

Fast Frame-Based Scene Change Detection in the Compressed Domain for MPEG-4 Video

Multimedia Databases. Previous Lecture. 4.1 Multiresolution Analysis. 4 Shape-based Features. 4.1 Multiresolution Analysis

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig

Multimedia Databases. 4 Shape-based Features. 4.1 Multiresolution Analysis. 4.1 Multiresolution Analysis. 4.1 Multiresolution Analysis

Bayesian video shot segmentation

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Data Warehousing & Data Mining

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Math 350: An exploration of HMMs through doodles.

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Hidden Markov Models Part 1: Introduction

Computational Cognitive Science

HMM part 1. Dr Philip Jackson

Digital Image Processing COSC 6380/4393

Statistical Filters for Crowd Image Analysis

Forecasting Wind Ramps

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course

encoding without prediction) (Server) Quantization: Initial Data 0, 1, 2, Quantized Data 0, 1, 2, 3, 4, 8, 16, 32, 64, 128, 256

Machine Learning Techniques for Computer Vision

Detection theory. H 0 : x[n] = w[n]

Classification & Information Theory Lecture #8

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS

2. the basis functions have different symmetries. 1 k = 0. x( t) 1 t 0 x(t) 0 t 1

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Bayesian Concept Learning

Brief Introduction of Machine Learning Techniques for Content Analysis

Sound Recognition in Mixtures

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Global Scene Representations. Tilke Judd

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

RESTORATION OF VIDEO BY REMOVING RAIN

Information Retrieval and Web Search Engines

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Bayesian Methods for Machine Learning

On The Role Of Head Motion In Affective Expression

Image Data Compression

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Introduction to Bayesian Learning

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

CSE 126 Multimedia Systems Midterm Exam (Form A)

Data Analyzing and Daily Activity Learning with Hidden Markov Model

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

A MODEL OF JERKINESS FOR TEMPORAL IMPAIRMENTS IN VIDEO TRANSMISSION. S. Borer. SwissQual AG 4528 Zuchwil, Switzerland

Vlad Estivill-Castro (2016) Robots for People --- A project for intelligent integrated systems

EM Algorithm & High Dimensional Data

Lecture 4: Perceptrons and Multilayer Perceptrons

Information and Entropy. Professor Kevin Gold

Machine Learning Linear Classification. Prof. Matteo Matteucci

University of Genova - DITEN. Smart Patrolling. video and SIgnal Processing for Telecommunications ISIP40

Face Detection and Recognition

A Generative Model Based Kernel for SVM Classification in Multimedia Applications

Five Ways Christian Marclay s The Clock does more than just

STA 4273H: Statistical Machine Learning

Algorithmisches Lernen/Machine Learning

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Lecture 3: Probabilistic Retrieval Models

STA 414/2104: Machine Learning

Generative Clustering, Topic Modeling, & Bayesian Inference

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Statistical Sequence Recognition and Training: An Introduction to HMMs

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Design and Implementation of Speech Recognition Systems

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

Statistical Methods for NLP

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Learning (II)

Basic Principles of Video Coding

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

CSCE 478/878 Lecture 6: Bayesian Learning

Bayesian Modeling and Classification of Neural Signals

Feature selection. Micha Elsner. January 29, 2014

Artificial Intelligence

Fault Tolerance Technique in Huffman Coding applies to Baseline JPEG

Probability Review and Naïve Bayes

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Deriving Principal Component Analysis (PCA)

Clustering with k-means and Gaussian mixture distributions

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Interpreting Deep Classifiers

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Forward algorithm vs. particle filtering

CITS 4402 Computer Vision

STA 414/2104: Lecture 8

Intelligent Systems (AI-2)

Chapter 2. Semantic Image Representation

Undirected Graphical Models

Multimedia Networking ECE 599

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Bayesian Networks Inference with Probabilistic Graphical Models

Transcription:

Multimedia Databases Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

Previous Lecture Hidden Markov Models (continued from last lecture) Introduction into Video Retrieval Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 2

10 Video Retrieval Shot Detection 10 Video Retrieval - Shot Detection 10.1 Video Abstraction 10.2 Shot Detection 10.3 Statistical Structure Models 10.4 Temporal Models 10.5 Shot Activity Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 3

10.1 Video Abstraction Temporal and spatial structuring of the content of a video Important for questions related to temporal issues: Find clips in which an object falls down! Basically, two sub-domains Video modeling and representation Video segmentation and summarization Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 4

10.1 Video Abstraction Video modeling General structure of a video Story Unit Story Unit Story Unit Structural Unit Structural Unit Structural Unit Structural Unit Structural Unit Shot Shot Shot Shot Shot Shot Frames Key Frame Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 5

10.1 Example News broadcast Story unit: War in Iraq Structural units: Introduction: The fighting around the city... Transmission: various scenes of war Summary: The reaction of the federal parliament... Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 6

10.1 Example Shots Anchorman in a studio Pan across a desert landscape Bombing of a city Refugees Anchorman in a studio Speech in the parliament Typical frames for all shots Usually represented by some key frame Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 7

10.1 Example Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 8

10.2 Shot Detection But how can shots be detected? With the introduction of MPEG-7 shot detection is ready-made Metadata standard The correct decomposition is already stored in the metadata Camera information is easy to extract But semantic annotation is unfortunately very expensive Archive material still needs a lot of manual work Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 9

10.2 Shot Detection A clip consists of many scenes Images belonging to a scene are relatively similar to each other Example: anchorman in the newsroom, desert landscape For this reason, we do not have to index each individual frame to perform efficient video retrieval, but index only key frames Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 10

10.2 Shot Detection Problems in finding key frames Detecting a scene transition with hard or soft transitions A hard transition is called a cut A soft transition dissolve (blending) or fade in/out Selecting a representative image, either by random selection, or with regard to the camera movement or an image with average characteristic values,... Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 11

10.2 Shot Detection For grouping of frames into shots each transition has to be recognized With uncompressed videos Information from each image is optimally used but the procedure is relatively inefficient Or compressed videos E.g., only data about the change is available Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 12

10.2 Shot Detection Shot detection in uncompressed videos Template matching (Zhang and others, 1993) Pixel wise comparison: For each pixel (x, y) in the image, the value of the color of the pixel in this frame is compared with the color value in a later frame If the change between two frames is large enough (larger than a predefined threshold), a cut is assumed This only works for hard transitions Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 13

10.2 Template Matching D cut = Σ x, y I(x, y, t) - I(x, y, t + 1) It is impossible to distinguish small changes in a wide area of major changes in a small area Susceptible to noise, object movements and changes in camera angle Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 14

10.2 Histograms Histogram-based methods (Tonomura, 1991) Assumption: frames containing identical foreground and background elements have a similar brightness distribution Classification based on the brightness values Histogram columns as the number of image pixels with a specified value Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 15

10.2 Histograms Let H(j, t) be the histogram value for the j th brightness value in frame t D cut = Σ j H( j, t ) H( j, t + 1) Once again using a predefined threshold we can decide whether there is a cut or not Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 16

10.2 Histograms Histograms are invariant towards image rotation and change only slightly under Object translation Occlusions caused by moving objects Slow camera movements Zooming Significantly less error sensitive than template matching Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 17

10.2 Threshold Good choice of thresholds is important Too low thresholds produce false cuts Too high thresholds leads to missed cuts Selection depends on the type of videos (training) Choose the threshold such that as few cuts as possible are overlooked, but not too many false cuts are produced Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 18

10.2 Threshold Selection, e.g., using distribution functions number Differences within the sequences Differences between sequences Selection by minimal error rate difference Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 19

10.2 Twin-Thresholding For smooth transitions (dissolves, fades,...) there are only small changes between consecutive transitions Still, the differences between the middle frames of different shots, are large enough Idea: use two thresholds One for the determination of hard cuts And one for the soft cuts Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 20

10.2 Twin-Thresholding Twin comparisons (Zhang and others, 1993) Threshold t c corresponds to the size of an intolerable change in the pixel intensities Using a threshold t s we can detect possible origins of smooth transitions If a possible smooth transition is detected at time t, the frame is marked at this time as a reference frame The next frames are compared against this reference frame Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 21

10.2 Twin-Thresholding All differences of subsequent frames in the interval [t + 1, t + n] are not computed regarding the direct predecessor, but the reference frame t (for some fixed n) Only if the difference rises above the threshold t c, there is a smooth cut, otherwise differences are simply re-formed between consecutive frames Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 22

10.2 Twin-Thresholding Example: difference possible soft cut hard cut no soft cut soft cut time Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 23

10.2 Block-based Techniques Block-based techniques try to avoid the problem of noise and different camera settings (Idris and Panchanathan, 1996) Each frame is divided into r blocks Local characteristics are calculated for each block Corresponding sub-frames are compared Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 24

10.2 Block-based Techniques Advantages We can detect and ignore effects occurring in only part of the picture through block-wise comparison E.g., movement of the anchorman s head If a high number of the r blocks are the same in a sequence of two consecutive frames, this is an indication of the frames belonging to the same shot Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 25

10.2 Model-based Procedure There are only a small amount of possible transitions between two shots Idea: model the transitions as mathematical operations Characteristic temporal patterns in video streams can be detected Advantage: this doesn t only recognize transitions, but also their type Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 26

10.2 Model-based Procedure E.g., a temporal model for fades When fading out the pictures of the first shot become darker. The brightness histogram is compressed in the x direction Then there are some (almost) black frames When fading in, the images of the second shot become brighter. The histogram is stretched in the x direction Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 27

10.2 Model-based Procedure This behavior can be interpreted as the application of mathematical operations on the histogram and observed on a stream of frames Defining the start and end of the fade out/in process delivers the shot boundaries Similar models can be set for other transitions (e.g., dissolve) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 28

10.2 E.g., Fade Out, Fade In Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 29

10.2 SD in Compressed Videos Shot detection in compressed videos Compressed storage is needed due to the size of video data Pixel-based methods for shot detection use uncompressed videos Very computationally intensive Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 30

10.2 SD in Compressed Videos Shot detection is possible also on the compressed data however trading between efficiency and accuracy Approaches are based on the MPEG compression information Cosine transformation coefficients Motion vectors information Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 31

10.2 MPEG Compression Compression based on the encoding of changes between frames I-frames are independently coded (I: independent) P-frames are encoded with change information from preceding I or P-frames (P: predicted) B-frames are interpolations between two P or I and P frame (bi-directional) B-frames can thus be calculated both from the preceding, and from the subsequent frame (depending on the encoder) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 32

10.2 MPEG Compression A shot is thus a chain of I-, P-and B-frames: IBPBPBIBPBP... The video stream is rearranged for transmission: IPBPBPBIPBP... Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 33

10.2 Shot Boundaries in MPEG I-frames are independently encoded Direct access to the DC component to measure differences between two consecutive I-frames Recognition method with DC-frames are directly applicable Accuracy: between two I-frames there usually are about 15 B-and P-frames Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 34

10.2 Cosine Transformation I-frames are usually compressed with discrete cosine transform (DCT) E.g., MPEG, H.264, MotionJPEG,... Each image is divided into blocks (e.g., 8x8 pixels in JPEG) Each block is separately transformed using DCT The first coefficient (DC) of the DCT is the average intensity of the block A DC-frame is created by using only the DCs of all the blocks and ignore all the higher coefficients Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 35

10.2 Cosine Transformation A sequence of DC frames is called DC sequence. DC sequences abstract video clips without having to decode them Taskiran and Delp, 1998 form generalized traces traces of features extracted from DC frames Scene change detection can be performed on these trace features by using a threshold Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 36

10.2 Motion Vectors (Block) motion vectors can be extracted directly from an MPEG bitstream Observation: the number of motion vectors, in consecutive frames belonging to the same shot is similar Example of shot detection (Zhang et al., 1993) Determine the number of motion vectors in the P- and B-frames If this number is smaller than a specified threshold, then it probably represents a shot boundary Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 37

10.2 Hybrid Approaches Procedures for the use of DCT coefficients and motion vectors can be combined Increase the recognition accuracy Utilization of various frame types in MPEG E.g., Meng and others, 1995 Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 38

10.2 Shot Detection Shot detection at work with MSU Video tool. Shot detection algorithms: Pixelwise comparison Global histogram Block based histogram Motion based detection Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 39

10.2 Shot Detection E.g., shot detection on Avatar movie trailer Motion based Pixel level Global Histogram Block-based Histogram Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 40

10.3 Statistical Structural Models Idea: decomposition of a video in semantic units (shots) Previously: low level primitives (brightness, color information, movements,...) Now: perceptional features (e.g., visual structure of the whole video) Film theory: stylistic elements Montage: temporal structure, editing,... Mis-en-scene: spatial structure, scenery, lighting, camera position,... Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 41

10.3 Statistical Structural Models Goal: build models of stylistic elements Allows the extraction semantic features for the characterization and classification Provides background information for the use of low level features to shot boundary detection Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 42

10.3 Example Trailer for movie arranged according to average shot length (montage) and activity during shots (Mis-en-scene) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 43

10.3 Example Shot duration and shot activity are very rough categories, but have equivalents in movie directing Basic trend: the shorter the shot, the higher the action (and vice versa) If we widely divide the movies into categories action film, comedy and love movies, then we can cluster according to these categories Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 44

10.3 Example Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 45

10.3 Example Clusters can be explained through film theory If emotions have to be transferred then long passages of text and detailed facial expressions (a long closeup) are required The development of a character and his connection with the audience takes time Charles Chaplin: Tragedy is a close-up, comedy a long shot. Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 46

10.3 Example For action or suspense, rhythmic patterns are used (e.g., Psycho or Birds by Hitchcock) Fast cuts require a continuous adaptation of the viewer and create confusion Long dialogues are unnecessary, people express themselves through acts Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 47

10.3 Video Structure Semantic structure assists in categorizing Either based on film theory Or learned from a sample collection From high-level structure patterns emerge more semantics than from low level features Statistical inference Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 48

10.3 Assumption The more a video is structured, the more semantic information can be derived from it News programs are highly structured and relatively easy to fragment Home made videos are mostly unstructured and almost impossible to fragment Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 49

10.3 Classical Elements The classical element of the movie direction is the shot duration Classic elements of the mis-en-scene are more difficult to capture Activity in scenes is important Not only between actors (explosions,...) Often correlates to violence But also mood (e.g., brightness, colors) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 50

10.4 Temporal Models Temporal video structure: shot boundaries can be modeled as a series of events occurring in succession Queuing theory: arrivals of persons Modeling through a Poisson process Number of events in a fixed time interval follows a Poisson distribution Temporal distance between two successive events is exponentially distributed Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 51

10.4 Temporal Video Structure Problem 1: exponential distribution leads to many short, but very few long shots Problem 2: exponential distribution has no memory, i.e., the probability that within the next t>0 time units a shot change will happen, is independent of t Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 52

10.4 Temporal Video Structure Alternative models: shot durations are not exponentially distributed, but follow distributions like Erlang distribution Weibull distribution Objective: estimate the model parameters from a training collection, were the shot boundary is manually determined Maximum likelihood estimate This knowledge can then assist in the detection of shot boundary of unknown videos Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 53

10.4 Erlang Model Consider shot durations are Erlang distributed The length τ of a (fixed) shot has probability density Generalization of the exponential distribution (r = 1) Expected value (average shot duration): r/λ The sum of r independent random variables exponentially distributed with parameter λ is (r, λ)-erlang distributed Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 54

10.4 Erlang Model r = 1, λ = ½ r = 2, λ = ½ r = 3, λ = ½ r = 5, λ = 1 r = 9, λ = 2 Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 55

10.4 Erlang Model The sum of r independent random variables exponentially distributed with parameter λ is (r, λ)-erlang distributed It represents a Poisson process since only exactly each r- th event is counted r = 2: structure of the context of the whole image, followed by a zoom on the essential details r = 3: emotional development, followed by an action, followed by the result of this action Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 56

10.4 Erlang Model Likelihood function for a single Erlangdistributed random variable: Corresponding log-likelihood function: Choose the optimal parameters r and λ for a sample of N independent and identically Erlang distributed random variables: Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 57

10.4 Erlang Model Optimization problem over a discrete variable (r) and a continuous variable (λ) Film theory: r is small Brute-force solution: Test all r = 1,..., 10 and compute the optimal λ Choose the pair (r, λ) that maximizes the above expression Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 58

10.4 Erlang Model If r is known then the determination is simplified Derivative with respect to λ and zero values returns: Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 59

10.4 Erlang Model Estimation of the parameters r and λ from a training collection: Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 60

10.4 Erlang Model Erlang distribution solves the first problem (distribution of shot durations) Problem 2, however, remains The Erlang distribution itself has memory but the exponentially distributed random variables underlying each shot have no memory Solution: Weibull distribution (a generalization of the exponential distribution) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 61

10.5 SD through Shot Activity To assess the activity within one shot, we can again rely on low level features One possibility: the difference of color histograms of two consecutive frames Goal: determine a statistical model for the activity within one shots with the help of histograms Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 62

10.5 Shot Activity Film theory: continuity in editing In order not to confuse the audience, the frames separated through cuts should differ clearly Segment the video into regular frames (state S = 0) and shot boundary (S = 1) Attempts to classify each frame either as regular frame or shot-boundary Additionally use low level features such as color histograms Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 63

10.5 Shot Activity Experience: Training data for shot activity can not be approximated good enough by means of standard deviation Therefore use several different distribution components (Vasconcelos and Lippman, 2000) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 64

10.5 Shot Activity Activity within shots (S = 0) Mixture of four random variables: three Erlang distributed one uniform distributed Distance Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 65

10.5 Shot Activity Activity in shot transitions (S = 1) Mixture of two random variables: a normal, and a uniform distribution Distance Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 66

10.5 Shot Boundary Detection Application of statistics: Given: two frames, there are two hypotheses: H 0 : there is no cut in between (S = 0) H 1 : there is a cut in between (S = 1) Likelihood ratio test: choose H 1 if (or equivalently: ) and H 0 otherwise (D is the measured distance between the two frames) > Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 67

10.5 Shot Boundary Detection The likelihood ratio test uses no knowledge about typical shot duration However, we know the a-priori distribution of the shot duration (or we can at least estimate it) Therefore, we now use Bayesian statistics to test the two hypotheses We obtain in this way a generalization of the basic thresholding method for histogram differences Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 68

10.5 Shot Boundary Detection Notation: δ: duration of each frame (constant, determined by frame rate) S t, t + δ : indicates whether there is a shot boundary between frame t and his immediate successors (or not) D t, t + δ : distance between frame t and his immediate successors S t : vector with components S 0, δ, S δ,2δ,..., S t, t + δ D t : vector with components D 0, δ, D δ,2δ,..., D t, t + δ Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 69

10.5 Shot Boundary Detection Hypothesis H 1 (there is a shot change)is valid, if Equivalent formulation: log > 0 > Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 70

10.5 Shot Boundary Detection If there was a cut at time t, and none in the interval [t, t + τ], then the probability for a cut in the interval [t + τ, t + τ + δ ] according to Bayes, is: γ is a normalization constant On the other hand, the probability that there is no cut, is: Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 71

10.5 Shot Boundary Detection Thus: Supposition: D t, t + δ is conditionally independent (with S t, t + δ ) from all other D and S Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 72

10.5 Shot Boundary Detection Behavior of conditional probabilities for activity (is estimated from the training collection, shot activity) Behavior of the probabilities for cuts (estimated from the training collection, distribution of shot duration) So hypothesis H 1 is valid if the logarithm of the above expression is positive Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 73

10.5 Hypothesis Verification Intuitive interpretation The left side uses information about the normal frame distances within shots and shot transitions The right part uses knowledge regarding the normal distribution of the shot duration (a priori probability) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 74

10.5 Hypothesis Verification Define with t as the time of the last cut Let be the distribution density of the elapsed time from t until the first cut after t The log posterior odds ratio is then: (same as, just different notation) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 75

10.5 Hypothesis Verification According to our initial Bayesian approach, we can decide whether there is a shot transition at point or not, by using the following threshold based estimation If the last cut took place at time t, and we now observe, then and only then there is a new cut, if applicable: : Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 76

10.5 Hypothesis Verification : This means: with the introduction of a priori probability, the verification of our hypotheses doesn t depend anymore from a fixed threshold The threshold changes dynamically with the time elapsed since the last cut The density can be assumed to be an Erlang or Weibull distribution density Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 77

10.5 Erlang Model Density function of the Erlang distribution: For the Erlang model, the following threshold function results: Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 78

10.5 Erlang Model Typical time distribution of thresholds: Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 79

10.5 Erlang Model Initially, the threshold is high Cuts are unlikely Cuts are therefore accepted only if the frame differences are very large Then, the threshold drops Cuts are accepted for clearly less changes to the features Problem is the asymptotic convergence to a positive value Constant level for several consecutive soft cuts Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 80

10.5 Erlang Model For all Erlang Thresholds we have: and thus there is always such a boundary line Threshold The problem comes from the assumption of the underlying exponential distribution in the Erlang model Also here is the solution the Weibull distribution Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 81

10.5 Experimental Verification Experimental verification (Vasconcelos and Lippman, 2000) Test within a collection cinema trailers Training (determination of model parameters) with the objects from the collection Task: segmentation of a new trailer ( Blankman ) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 82

10.5 Experimental Verification Trailer for Blankman Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 83

10.5 Experimental Verification For each trailer simple color histogram distances were used for determining the selected activity The fixed threshold was chosen as good as possible (through tests) O : Missed cut * : False estimated cut Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 84

10.5 Experimental Verification Fixed threshold: Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 85

10.5 Experimental Verification Weibull threshold: Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 86

10.5 Experimental Verification Direct comparison of two samples Fixed threshold Weibullthreshold Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 87

10.5 Experimental Verification Total number of errors: Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 88

This Lecture Video Retrieval - Shot Detection Video Abstraction Shot Detection Statistical Structure Models Temporal Models Shot Activity Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 89

Next lecture Video Signatures Intuitive Video Similarity Voronoi Video Similarity Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 90