Review (Probability & Linear Algebra)

Similar documents
Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Statistical Pattern Recognition

L2: Review of probability and statistics

Recitation 2: Probability

Probability Theory Review Reading Assignments

Linear Algebra in Computer Vision. Lecture2: Basic Linear Algebra & Probability. Vector. Vector Operations

Recitation. Shuang Li, Amir Afsharinejad, Kaushik Patnaik. Machine Learning CS 7641,CSE/ISYE 6740, Fall 2014

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Lecture 2: Repetition of probability theory and statistics

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Probability Theory for Machine Learning. Chris Cremer September 2015

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

1 Proof techniques. CS 224W Linear Algebra, Probability, and Proof Techniques

Communication Theory II

Multivariate Statistical Analysis

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Multivariate Random Variable

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Lecture 11. Probability Theory: an Overveiw

Algorithms for Uncertainty Quantification

Review of Linear Algebra

Multiple Random Variables

Mathematical foundations - linear algebra

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Basic Concepts in Matrix Algebra

Linear Algebra Review. Vectors

Lecture 1: Review of linear algebra

Review: mostly probability and some statistics

Linear Algebra Massoud Malek

Chapter 3 Transformations

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

BASICS OF PROBABILITY

Grundlagen der Künstlichen Intelligenz

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

3. Review of Probability and Statistics

Knowledge Discovery and Data Mining 1 (VO) ( )

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Basics on Probability. Jingrui He 09/11/2007

Statistics for scientists and engineers

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Gaussian random variables inr n

Probability theory. References:

Appendix A : Introduction to Probability and stochastic processes

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Multivariate Distributions (Hogg Chapter Two)

Functional Analysis Review

Bivariate distributions

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Introduction to Probability and Statistics (Continued)

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Math 302 Outcome Statements Winter 2013

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Basic Probability. Robert Platt Northeastern University. Some images and slides are used from: 1. AIMA 2. Chris Amato

Multivariate Statistics

component risk analysis

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Single Maths B: Introduction to Probability

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Stochastic Processes. Review of Elementary Probability Lecture I. Hamid R. Rabiee Ali Jalali

Random Variables and Their Distributions

Modèles stochastiques II

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Introduction to Machine Learning

Lecture Notes 1: Vector spaces

Linear Algebra V = T = ( 4 3 ).

Mathematical foundations - linear algebra

Probability theory basics

Linear Algebra Practice Problems

Chap 3. Linear Algebra

02 Background Minimum background on probability. Random process

Outline. Random Variables. Examples. Random Variable

1. General Vector Spaces

01 Probability Theory and Statistics Review

1 Linear Regression and Correlation

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Quick Tour of Basic Probability Theory and Linear Algebra

Joint Distribution of Two or More Random Variables

Quantifying uncertainty & Bayesian networks

1: PROBABILITY REVIEW

conditional cdf, conditional pdf, total probability theorem?

a b = a T b = a i b i (1) i=1 (Geometric definition) The dot product of two Euclidean vectors a and b is defined by a b = a b cos(θ a,b ) (2)

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Chapter 5. The multivariate normal distribution. Probability Theory. Linear transformations. The mean vector and the covariance matrix

MULTIVARIATE DISTRIBUTIONS

Linear Algebra for Machine Learning. Sargur N. Srihari

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Probability, Random Processes and Inference

Bayes Decision Theory

Transcription:

Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani

Outline Axioms of probability theory Conditional probability, Joint probability, Bayes rule Discrete and continuous random variables Probability mass and density functions Expected value, variance, standard deviation Expectation for two variables covariance, correlation Some probability distributions Gaussian distribution Linear Algebra 2

Basic Probability Elements Sample space ( ): set of all possible outcomes (or worlds) Outcomes are assumed to be mutually exclusive. An event is a certain set of outcomes (i.e., subset of ). A random variable is a function defined over the sample space Gender: {, } Height: R 3

Probability Space A probability space is defined as a triple : A sample space Ω that contains the set of all possible outcomes (outcomes also called states of nature) Aset whose elements are called events. The events are subsets of Ω. should be a Borel Field. represents the probability measure that assigns probabilities to events. 4

Probability and Logic Probability theory can be viewed as a generalization of propositional logic Probabilistic logic : is a propositional logic sentence Belief of agent in is no longer restricted to true, false, unknown () can range from 0 to 1 5

Probability Axioms (Kolomogrov) Axioms define a reasonable theory of uncertainty Kolomogrov s probability axioms (propositional notation) () 0 ( ) If is a tautology = 1 = + ( ) (, ) True a b ^ 6

Random Variables Random variables: Variables in probability theory Domain of random variables: Boolean, discrete or continuous Probability distribution: the function describing probabilities of possible values of a random variable = 1 =,(=2)=, 7

Random Variables Random variable is a function that maps every outcome in to a real (complex) number. To define probabilities easily as functions defined on (real) numbers. Expectation, variance, Head Tail 0 1 Real line 8

Probabilistic Inference Joint probability distribution Can specify probability of every atomic event We can find every probabilities from it (by summing over atomic events). Prior and posterior probabilities belief in the absence or presence of evidences Bayes rule used when it is difficult to compute ( ) but we have information about ( ) Independence new evidence may be irrelevant 9

Joint Probability Distribution Probability of all combinations of the values for a set of random variables. If two or more random variables are considered together, they can be described in terms of their joint probability Example: Joint probability of features (,,, ) 10

Prior and Posterior Probabilities Prior or unconditional probabilities of propositions: belief in the absence of any other evidence e.g., = = 0.14 (h = ) = 0.6 Posterior or conditional probabilities: belief in the presence of evidences ( ) is the probability of given that is true e.g., h = = ) 11

Conditional Probability if Product rule (an alternative formulation): ( ) = ( ) () = ( ) () ( ) obeys the same rules as probabilities E.g., ( ) + (~ ) = 1 12

Conditional Probability For statistically dependent variables, knowing the value of one variable may allow us to better estimate the other. All probabilities in effect are conditional probabilities E.g., () = ( ) (. ) Ω Type equation here. Ω Renormalize the probability of events jointly occur with b 13

Conditional Probability: Example Rolling a fair dice : the outcome is an even number : the outcome is a prime number P a b Pab 1 6 1 Pb 1 3 2 14

Chain rule Chain rule is derived by successive application of product rule: ( 1,, ) = ( 1,, ) ( 1,, ) = ( 1,, ) ( 1,, ) ( 1,, ) = =( ) ( 1,, ) 15

Law of Total Probability mutually exclusive and is an event (subset of ) = ( )+ ( )+ + ( ) 16

Independence Propositions and are independent iff ( ) = () = (, ) = () () Knowing tells us nothing about (and vice versa) 17

Bayes Rule Bayes rule: Obtained form product rule: = = In some problems, it may be difficult to compute ( ) directly, yet we might have information about ( ). ( ) = ( ) () / () 18

Bayes Rule Often it would be useful to derive the rule a bit further: 19

Bayes Rule: Example Meningitis( )&Stiffneck( ) = = 0.01 =0.7 ( ) = () /() = 0.7 0.0002/0.01 = 0.0014 20

Bayes Rule: General Form 21

Calculating Probabilities from Joint Distribution Useful techniques Marginalization = (, ) Conditioning = () Normalization X = X, 22

Probability Mass Function (PMF) Probability Mass Function (PMF) shows the probability for each value of a discrete random variable Each impulse magnitude is equal to the probability of the corresponding outcome Example: PMF of a fair dice () ( = ) 0 (=) =1 23

Probability Density Function (PDF) Probability Density Function (PDF) is defined for continuous random variables 2 2 The probability of x x, x is ( ) 0 0 1 px ( x 0) lim P( x 0 X x 0 ) 0 2 2 () () 0 = 1 ( ) 24

Cumulative Distribution Function (CDF) Cumulative Distribution Function (CDF) Defined as the integration of PDF Similarly defined on discrete variables (summation instead of integration) () CDF x F x P X x x X. FX x px d Non-decreasing Right Continuous ( ) = 0 ( ) = 1 () = () P( u x v) F ( v) F ( u) X X 25

Distribution Statistics Basic descriptors of spatial distributions: Mean Value Variance & Standard Deviation Moments Covariance & Correlation 26

Expected Value Expected (or mean) value: weighted average of all possible values of the random variable 27 Expectation of a discrete random variable : xpx E X Expectation of a function of a discrete random variable : Expected value of a continuous random variable : Expectation of a function of a continuous random variable : x ( ) g( x) Px E g X E X xpx x dx x E g X g x px x dx

Variance Variance: a measure of how far values of a random variable are spread out around its expected value 2 Var X E X E X X 2 2 X X E X Standard deviation: square root of variance X var( X ) 28

Moments Moments n th order moment of a random variable : M E X n n Normalized n th order moment: E X X n The first order moment is the mean value. The second order moment is the variance added by the square of the mean. 29

Correlation & Covariance Correlation Crr ( X, Y ) E XY Discrete RVs x xyp ( x, y ) y Covariance is the correlation of mean removed variables: Cov ( X, Y ) E X X Y Y E XY X Y 30

Covariance: Example, = 0, = 0.9 31

Covariance Properties The covariance value shows the tendency of the pair of RVs to increase together >0 and tend to increase together <0 : tends to decrease when increases =0 : no linear correlation between and 32

Orthogonal, Uncorrelated & Independent RVs Orthogonal random variables ( ), = 0 Uncorrelated random variables ( ), = 0 Independent random variables, = 0 Independent random variables 33

Pearson s Product Moment Correlation Defined only if both and are finite and nonzero. shows the degree of linear dependence between and. 1 1 according to Cauchy- Schwarz inequality ( ) =1shows a perfect positive linear relationship and = 1 shows a perfect negative linear relationship 34

Pearson s Correlation: Examples = (, ) [Wikipedia] 35

Covariance Matrix If is a vector of random variables ( -dim random vector): Covariance matrix indicates the tendency of each pair of RVs to vary together =[ ] = = ( ) ( ) = (( )( )) (( )( )) (( )( )) (( )( )) 36

Covariance Matrix: Two Variables 12 21 Cov ( X, Y ) = 1 0 0 1 = 1 0.9 0.9 1 37

Covariance Matrix shows the covariance of and : ij Cov ( X i, X j ) E X i i. X j j = 38

Sums of Random Variables Mean: = + =+ Variance: () = () + () + 2(, ) If, independent: () = () + () Distribution: p ( z) p ( x, z x) dx Z independence X, Y p ( x) p ( y ) p ( x) p ( z x) dx X Y X Y 39

Some Famous Probability Mass and Density Functions Uniform () ~(, ) 1 p x U a U b b a 1 Gaussian (Normal) 1 2 2. p x e N. 2 x 2, () 1 2 ~(, ) 40

Some Famous Probability Mass and Density Functions Binomial n P X k 1p p k nk k ( = ) ~(, ) Exponential () x x e x 0 px e U x 0 x 0 41 0

Gaussian (Normal) Distribution It is widely used to model the distribution of continuous variables 68% within,+ 95% within 2,+2 Standard Normal distribution: 42

Multivariate Gaussian Distribution is a vector of Gaussian variables 1 ( ) 1 1 xμ T ( xμ) p() x N (, μ ) e 2 (2 ) d /2 1/2 μe[] x E( x ) E( x ) 1 d T E[( xμ)( xμ) ] Mahalanobis Distance: 43 r 2 T x μ x μ T 1 ( ) ( ) Probability Density 0.4 0.3 0.2 0.1 0 2 0 x2 2 3 2 1 x1 0 1 2 3

Multivariate Gaussian Distribution The covariance matrix is always symmetric and positive semidefinite Multivariate Gaussian is completely specified by +( + 1)/2 parameters Special cases of : = : Independent random variables with the same variance (circularly symmetric Gaussian) Digonal matrix = different variances : Independent random variables with 44

Multivariate Gaussian Distribution Level Surfaces The Gaussian distribution will be constant on surfaces in -space for which: = Hyper-ellipsoid Principal axes of the hyper-ellipsoids are the eigenvectors of. Bivariate Gaussian: Curves of constant density are ellipses. 45

Bivariate Gaussian distribution and are the eigenvalues of ( )and and are the corresponding eigenvectors = 46

Linear Transformations on Multivariate Gaussian : Whitening transform = 47

Gaussian Distribution Properties Some attracting properties of the Gaussian distribution: Marginal and conditional distributions of a Gaussian are also Gaussian After a linear transformation, a Gaussian distribution is again Gaussian There exists a linear transformation that diagonalizes the covariance matrix (whitening transform). It converts the multivariate normal distribution into a spherical one. Gaussian is a distribution that maximizes the entropy Gaussian is stable and infinitely divisible Central Limit Theorem Some distributions can be estimated by Gaussian distribution when their parameter value is sufficiently large (e.g., Binomial) 48

Central Limit Theorem (under mild conditions) Suppose i.i.d. (Independent Identically Distributed) RVs ( =1,,) with finite variances Let S N be the sum of these RVs Distribution of converges to a normal distribution as increases, regardless to the distribution of the RVs. Example: N X i 1 i ~ uniform, =1,, = 1 49

Linear Algebra: Basic Definitions Matrix : Matrix Transpose A a a a a...... a a............ a a... a 11 12 1n 21 22 2n [ aij ] mn m1 m2 mn T BA b a 1 i n,1 j m Symetric matrix = ij ji Vector 50 a a1 T... a [ a1,..., an ] a n

Linear Mapping Linear function ( + ) = () + () () = (),, A linear function: In general, a matrix = can be used to denote a map : R R where = + + = 51

Linear Algebra: Basic Definitions Inner (dot) product ab, Matrix multiplication A T a b i 1 [ a ] B [ b ] ij mp ij pn n ab i i AB = C [ c ] c A, B T ij mn ij i.. j i-th row of A j-th column of B 52

Inner Product Inner (dot) product Length (Euclidean norm) of a vector a is normalized iff a = 1 Angle between vectors Orthogonal vectors and : n T ab i 1 ab n T a a i 1 Orthonormal set of vectors,,, :, = 1 = 0.. 53 a cos T ab i T ab a b 0 i a 2 i

Linear Independence A set of vectors is linearly independent if no vector is a linear combination of other vectors. + +...+ = 0 = =...= = 0 54

Matrix Determinant and Trace Determinant () = () () n det( A) a A ; i 1,... n; A ij j 1 i j ( 1) det( M ) ij ij ij Trace tr[ A ] n a j 1 jj 55

Matrix Inversion Inverse of : AB BA I B A n 1 exists iff det () 0 (A is nonsingular) Singular: det () = 0 ill-conditioned:a is nonsingular but close to being singular Pseudo-inverse for a non square matrix # = is not singular # = 56

Matrix Rank : maximum number of linearly independent columns or rows of A. : Full rank : iff is nonsingular ( ) 57

Eigenvectors and Eigenvalues Characteristic equation: n-th order polynomial, with n roots det( AI ) 0 n tr( A) n j j 1 det( A) A n j 1 j 58

Eigenvectors and Eigenvalues: Symmetric Matrix For a symmetric matrix, the eigenvectors corresponding to distinct eigenvalues are orthogonal These eigenvectors can be used to form an orthonormal set ( and ) 59

Eigen Decomposition: Symmetric Matrix = = = Eigen decomposition of a symmetric matrix: 60

Positive definite matrix Symmetric is positive definite: Eigen values of a positive define matrix are positive:, >0 61

Simple Vector Derivatives 62