Statistical and Learning Techniques in Computer Vision Lecture 1: Random Variables Jens Rittscher and Chuck Stewart

Similar documents
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Lecture Notes on the Gaussian Distribution

B4 Estimation and Inference

Probability Theory Review Reading Assignments

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Minimum Error-Rate Discriminant

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

CS 195-5: Machine Learning Problem Set 1

Bayesian Decision Theory Lecture 2

Machine Learning Lecture 2

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Expect Values and Probability Density Functions

STT 441 Final Exam Fall 2013

Deep Learning for Computer Vision

Review of Probability Theory

ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin

The Multivariate Gaussian Distribution [DRAFT]

Machine Learning Lecture 2

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Introduction to Machine Learning

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

The random variable 1

COM336: Neural Computing

Lecture 2: Review of Basic Probability Theory

Motivating the Covariance Matrix

Probability and statistics; Rehearsal for pattern recognition

Mathematics 426 Robert Gross Homework 9 Answers

p(d θ ) l(θ ) 1.2 x x x

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Introduction to Stochastic Processes

Lecture 1: Probability Fundamentals

Unsupervised Learning with Permuted Data

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Chapter 4. Continuous Random Variables 4.1 PDF

Introduction to Probability and Stocastic Processes - Part I

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

p(x ω i 0.4 ω 2 ω

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

1 Random Variable: Topics

Probability Models for Bayesian Recognition

Pattern Recognition 2

Multivariate statistical methods and data mining in particle physics

Math 105 Course Outline

Introduction to Machine Learning

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Lecture 11. Probability Theory: an Overveiw

Random Variables. P(x) = P[X(e)] = P(e). (1)

From Lay, 5.4. If we always treat a matrix as defining a linear transformation, what role does diagonalisation play?

Random Variables and Their Distributions

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

CSC 411 Lecture 12: Principal Component Analysis

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Parametric Techniques

Review (Probability & Linear Algebra)

The generative approach to classification. A classification problem. Generative models CSE 250B

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

MULTIVARIATE PROBABILITY DISTRIBUTIONS

02 Background Minimum background on probability. Random process

Notes on Random Variables, Expectations, Probability Densities, and Martingales

Course content (will be adapted to the background knowledge of the class):

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

3. Review of Probability and Statistics

Multivariate Distributions (Hogg Chapter Two)

Parametric Techniques Lecture 3

DD Advanced Machine Learning

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Nonparametric Methods Lecture 5

System Identification

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

COURSE INTRODUCTION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

ABSTRACT EXPECTATION

Machine Learning and Related Disciplines

Intelligent Systems I

Machine Learning Lecture 1

Lecture Note 1: Probability Theory and Statistics

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Probability Theory for Machine Learning. Chris Cremer September 2015

Recitation 2: Probability

Machine learning for pervasive systems Classification in high-dimensional spaces

3. Probability and Statistics

Review: mostly probability and some statistics

Machine Learning (Spring 2012) Principal Component Analysis

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Expectation and Variance

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Introduction to Machine Learning

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Uncertainty Quantification in Computational Science

L2: Review of probability and statistics

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Probabilistic Graphical Models

Continuous distributions

Week 2. Review of Probability, Random Variables and Univariate Distributions

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Transcription:

Statistical and Learning Techniques in Computer Vision Lecture 1: Random Variables Jens Rittscher and Chuck Stewart 1 Motivation Imaging is a stochastic process: If we take all the different sources of error into account it is hard to argue that digital images are results of deterministic processes. One can argue that the images we obtain using a digital camera, a frame grabber, a microscope, or a thermal camera are realizations of a stochastic process. Capturing statistics about appearance can help to make inferences: Here are different distributions for pixels that come from cars, from shadows and from the background (see figure 1) 0.6 0.4 0.2 Traffic 0 0 50 100 150 200 250 P b (x) z 0.6 0.6 0.4 0.4 0.2 0.2 0 0 50 100 150 200 250 z 0 0 50 100 150 200 250 z P c(x) P s(x) Figure 1: Pixels as random variables. The image shows a sample frame of a surveillance camera installed to monitor traffic. In order to build a background model of the scene the set of pixels was devided into three different sets: background, cars, and shadow. The three graphs show the corresponding historgrams of the grey values of pixel locations from the different sets. Notice that the distribution of the background pixels P b (x) is sharply peaked. The distribution of greyvalues of pixels P c(x) that correspond to vehicles is, on the other hand, almost uniform. As another example, we can learn distributions of skin colors as an aid to face detection in images. 1

Structural and appearance variations of objects are difficult to model explicitly, especially when combined with the behavior of computer vision algorithms. 2 Lecture Overview We will go through the following material relatively quickly. Much of it is expected to be a review. Random variables Probability distributions and densities Mean and variance The Gaussian distribution Expectation Conditional probability, independence and Bayes theorem 3 A starting point: pixels as random variables In all of the examples above we view each pixel in the image as a random variable. The results of operations on pixels can also be modeled as random variables e.g. smoothing operations, motion vector computations, or even edge detection results. Images are multivariate random variables. 4 Random variables A random variable is a mathematical function that assigns outcomes of a random experiment to numbers. To every outcome of this experiment we assign a number x(ξ) i.e. x : L X (1) where L is the domain of the random variable and X the range. Examples include Rolling a di. Rolling a pair of dice. Note the difference in the domain! Taking a photograph and measuring the intensity at each pixel in an image. 2

5 Distributions and Densities The culmulative distribution function of a random variable x is the function defined for very x X. F (x) = P {x x}, (2) A random variable x is continuous if its distribution function, F, is continuous. x is discrete if F is a staircase function. In case F is discontinuous but not a staircase the random variable x is of mixed type. Note that F (x) is a monotonically non-decreasing function of x, lim x F (x) = 0, and lim x F (x) = 1. The empirical distribution of a random variable x is constructed by performing an experiment n times and observing n values x 1,..., x n. Using the step function, { 1 y 0 U(y) = 0 y < 0, (3) the empirical distribution of the random variable is n i=1 F n (x) := U(x x i). (4) n Stated less formally, F n (x) is the fraction of the observed values less than or equal to n. As n, F n F. The derivative of F (x), df (x) f(x) = dx, (5) is called the density of the random variable x. Properties of the density follow from properties of the distribution function. For discrete random variables, the density is a series of impulse or probability mass functions. At every x 0 where lim x x0 F (x) lim x x0 F (x), the probability mass is just p(x 0 ) = lim x x0 F (x) lim x x0 F (x) (6) 3

Two or more random variables may be combined into a random vector (x 1,... x k ). The cumulative distribution of (x 1,..., x k ) is F (x 1,..., x k ) = P (x 1 x 1,..., x k x k ), (7) where on the right side of the equation the, means and. The density is obtained by the kth partial derivative: f(x 1,..., x k ) = dk F (x 1,..., x k ) dx 1 dx k. (8) 6 Mean and variance the simplest statistics The mean of a random variable is µ(x) = xf(x)dx. (9) The mean (vector) of a pair of random variables, x and y, is ( ) ( ) x x µ = f(x, y)dxdy (10) y y Recall that this can be viewed as two double integrals: ( ) xf(x, y)dxdy x µ = y (11) yf(x, y)dxdy The mean of a discrete random variable is consequently computed as m µ(x) = x i p(x i ) when x {x 1,..., x m }. (12) i=1 The variance of a random variable is var(x) = This is sometimes written as σ 2 x or just σ 2. (x µ(x))f(x)dx. (13) The covariance matrix of two random variables is ( ) x µ(x) (x ) Σ(x, y) = µ(x) y µ(y) f(x, y)dxdy (14) y µ(y) The off-diagonal term of the resulting (symmetric) matrix, is the covariance of x and y. [x µ(x)][y µ(y)]f(x, y)dxdy (15) 4

7 Expectation Given a function g(x) of a random variable x and the density f(x) associated with x, the expectation of g is E[g(x)] = g(x)f(x)dx. (16) The generalization to a pair or sequence of random variables is straightforward. We can write the mean and variance in terms of the expectation operator : µ(x) = E[x] (17) var(x) = E[(x µ(x)) 2 ] (18) We can think of g(x) more generally as any function of x. Note that y = g(x) is a random variable if x is a random variable. We can use the expectation operator to compute the mean and variance of y based on the density of x. 8 The Normal or Gaussian Distribution For many reasons this relatively simple distribution is the most widely used and studied. In case of a single variable the normal distribution is written as p(x) = 1 e (x µ)2 2σ 2 2πσ where µ is the mean of the distribution σ 2 σ is the variance, i.e. the standard deviation (19) The leading constant, ( 2πσ) 1, ensures that p(x)dx = 1, as required. The generalized multi-variate Gaussian is written as p(x) = 1 (2π) d/2 Σ 1/2 e 1 2 (x µ) T Σ 1 (x µ) (20) where d is the dimension of the vectors µ is the mean vector (center) of the distribution, Σ is the d d, positive semi-definite, covariance matrix, and Σ is the determinant of Σ. 5

The function is called the Mahalanobis distance. 2 (x) = (x µ) T Σ 1 (x µ) (21) It is a better notion of the distance of the location x from the mean µ because it takes into account the uncertainty encoded in Σ. Contours (surfaces, hypersurfaces) of constant Malanaobis distance in R d are elliptical (ellipsoidal, hyperellipsoidal). The principal axes of the hyperellipsoids are given by the (unit) eigenvectors u i of Σ. In certain cases it is convenient to consider a simplified form of the Gaussian distribution in which Σ is a diagonal matrix. i.e. the components of x are statistically independent. Σ = diag(σ 2 1,..., σ 2 d) (22) The spectral decomposition of Σ produces a rotation matrix which can be applied to the coordinate system x. In the new coordinate system, Σ will be diagonal. Whether or not this is a good thing to do depends on the application. The central limit theorem states that under general circumstances the mean of M random variables tends to be distributed normally, in the limit as M tends to infinity. 9 Conditional probability The conditional probability of an event A assuming an event M is P (A M) = P (A, M) P (M), (23) where we assume that P (M) is not 0. Bayes s Theorem: Given any events A and B, P (A B) = P (B A)P (A) P (B) (24) The terms a priori and a posteriori are often used for the probabilities P (A) and P (A B). We can prove Bayes s Theorem trivially from the definition of conditional probability. 6

Both conditional probability and Bayes s Theorem are easily visualized using Venn diagrams. Two events A and B are independent if P (A, B) = P (A)P (B). (25) Note how Bayes s Theorem collapses when two events are independent. The concept of independence (and is complement) is fundamental, and the exercises in HW 1 will help give you a first insight. Later in the semester we will often review the concept of independence and its implications. 10 Further Reading Most of the above topics are covered in introductory textbooks on probability and statistics, as well as in background sections of recent books on pattern recognition and machine learning [Bis06] [DHS01]. References [Bis06] Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. [DHS01] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification. John Wiley and Sons, 2001. 7