Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Similar documents
Review (Probability & Linear Algebra)

Statistical Pattern Recognition

L2: Review of probability and statistics

Communication Theory II

Recitation 2: Probability

Lecture 2: Repetition of probability theory and statistics

Recitation. Shuang Li, Amir Afsharinejad, Kaushik Patnaik. Machine Learning CS 7641,CSE/ISYE 6740, Fall 2014

Probability Theory Review Reading Assignments

Linear Algebra in Computer Vision. Lecture2: Basic Linear Algebra & Probability. Vector. Vector Operations

Multivariate Statistical Analysis

Algorithms for Uncertainty Quantification

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

Multivariate Random Variable

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Basics on Probability. Jingrui He 09/11/2007

Multiple Random Variables

Mathematical foundations - linear algebra

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

Linear Algebra Review. Vectors

1 Proof techniques. CS 224W Linear Algebra, Probability, and Proof Techniques

Probability Theory for Machine Learning. Chris Cremer September 2015

Basic Probability. Robert Platt Northeastern University. Some images and slides are used from: 1. AIMA 2. Chris Amato

Appendix A : Introduction to Probability and stochastic processes

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Multivariate Distributions (Hogg Chapter Two)

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

BASICS OF PROBABILITY

Discrete Random Variables

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Statistics for scientists and engineers

Chapter 2 Random Variables

Lecture 11. Probability Theory: an Overveiw

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Stochastic Processes. Review of Elementary Probability Lecture I. Hamid R. Rabiee Ali Jalali

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

component risk analysis

Review of Linear Algebra

Bivariate distributions

Lecture 1: Basics of Probability

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Lecture 1: Review of linear algebra

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Probability theory. References:

Linear Algebra V = T = ( 4 3 ).

Random Variables and Their Distributions

3. Review of Probability and Statistics

Probability. Lecture Notes. Adolfo J. Rumbos

Single Maths B: Introduction to Probability

Gaussian random variables inr n

Review: mostly probability and some statistics

Chapter 1 Statistical Reasoning Why statistics? Section 1.1 Basics of Probability Theory

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

02 Background Minimum background on probability. Random process

The random variable 1

Mathematical foundations - linear algebra

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Introduction to Probability and Statistics (Continued)

Multivariate Statistics

Review of Probability Theory

Review of Elementary Probability Lecture I Hamid R. Rabiee

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

1: PROBABILITY REVIEW

EE4601 Communication Systems

Statistics for Economists Lectures 6 & 7. Asrat Temesgen Stockholm University

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Background Mathematics (2/2) 1. David Barber

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Chapter 5 continued. Chapter 5 sections

ECE Lecture #10 Overview

Lecture 2: Review of Basic Probability Theory

Probability Theory Review

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Linear Algebra Practice Problems

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Computational Genomics

Outline. Random Variables. Examples. Random Variable

Grundlagen der Künstlichen Intelligenz

Review of Probability. CS1538: Introduction to Simulations

Section 8.1. Vector Notation

Lecture 2: Review of Probability

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Chap 3. Linear Algebra

Joint Distribution of Two or More Random Variables

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Maths for Signals and Systems Linear Algebra in Engineering

Ma/CS 6b Class 20: Spectral Graph Theory

Introduction to Mobile Robotics Compact Course on Linear Algebra. Wolfram Burgard, Bastian Steder

Functional Analysis Review

MATH 315 Linear Algebra Homework #1 Assigned: August 20, 2018

ECE531: Principles of Detection and Estimation Course Introduction

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Transcription:

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna s lectures

Outline Axioms of probability theory Conditional probability, Joint probability, Bayes rule Discrete and continuous random variables Probability mass, density and distribution functions Expected value, variance, standard deviation Expectation for two variables covariance, correlation Linear Algebra 2

Basic probability elements Sample space ( ): set of all possible outcomes (or worlds) Outcomes are assumed to be mutually exclusive. An event is a certain set of outcomes (i.e., subset of ). A random variable is a function defined over the sample space Gender: {, } Height: R 3

Probability space Atripleof Ω represents a nonempty set, whose elements are called outcomes or states of nature represents a set, whose elements are called events. The events are subsets of Ω. should be a Borel Field. represents the probability measure. 4

Probability and Logic Probability theory can be viewed as a generalization of propositional logic Probabilistic logic : is a propositional logic sentence Belief of agent in is no longer restricted to true, false, unknown () can range from 0 to 1 5

Probability axioms (Kolomogrov) Axioms define a reasonable theory of uncertainty Kolomogrov s probability axioms (propositional notation) () 0 ( ) If is a tautology = 1 = + ( ) (, ) True a b ^ 6

Random variables Random variables: Variables in probability theory Domain of random variables: Boolean, discrete or continuous Probability distribution: the function describing probabilities of possible values of a random variable = 1 =,(=2)=, 7

Random variables Random variable is a function that maps every outcome in to a real (complex) number. To define probabilities easily as functions defined on (real) numbers. Expectation, variance, Head Tail 0 1 Real line 8

Probabilistic inference Joint probability distribution Can specify probability of every atomic event We can find every probabilities from it (by summing over atomic events). Prior and posterior probabilities belief in the absence or presence of evidences Bayes rule used when it is difficult to compute ( ) but we have information about ( ) Independence new evidence may be irrelevant 9

Joint probability distribution Probability of all combinations of the values for a set of random vars. If two or more random variables are considered together, they can be described in terms of their joint probability (h, ) : 7 4 matrix of values W=sunny W=rainy W=cloudy W=snow Day = Saturday 0.08 0.02 0.04 0.01 Day = Sunday 0.07 0.01 0.06 0.00 Day = Friday 0.09 0.02 0.03 0.01 10

Prior and posterior probabilities Prior or unconditional probabilities of propositions: belief in the absence of any other evidence e.g., = = 0.14 (h = ) = 0.6 Posterior or conditional probabilities: belief in the presence of evidences ( ) is the probability of given that is true e.g., h = = ) 11

Conditional probability if Product rule (an alternative formulation): ( ) = ( ) () = ( ) () ( ) obeys the same rules as probabilities E.g., ( ) + (~ ) = 1 12

Conditional probability For statistical dependent variables, knowing the value of one variable may allow us to better estimate the other. All probabilities in effect are conditional probabilities E.g., () = ( ) (. ) Ω Type equation here. Ω Renormalize the probability of events jointly occur with b 13

Example Rolling a fair dice : the outcome is an even number : the outcome is a prime number P a b Pab 1 6 1 Pb 1 3 2 14

Chain rule Chain rule is derived by successive application of product rule: ( 1,, ) = ( 1,, ) ( 1,, ) = ( 1,, ) ( 1,, ) ( 1,, ) = =( ) ( 1,, ) 15

Law of total probability mutually exclusive and is an event (subset of ) = ( )+ ( )+ + ( ) 16

Independence Propositions and are independent iff ( ) = () = (, ) = () () Knowing tells us nothing about (and vice versa) 17

Bayes' rule Bayes' rule: Obtained form product rule: = = In some problems, it may be difficult to compute ( ) directly, yet we might have information about ( ). ( ) = ( ) () / () 18

Bayes' rule: example Meningitis( )&Stiffneck( ) = = 0.01 =0.7 ( ) = () /() = 0.7 0.0002/0.01 = 0.0014 19

Bayes rule Often it would be useful to derive the rule a bit further: 20

Bayes rule 21

Calculating probabilities from joint distribution Useful techniques Marginalization = (, ) Conditioning = () Normalization X = X, 22

Inference: example Joint probability distribution: h h h 0.108 0.012 0.016 0.064 h h h 0.072 0.008 0.144 0.576 Conditional probabilities: 23 ( hh) ( hh) = (hh) 0.016 + 0.064 = 0.108 + 0.012 + 0.016 + 0.064 = 0.4

Probability Mass Function (PMF) Probability Mass (Distribution) Function (PMF) Discrete random variables Summation of impulses The magnitude of each impulse represents the probability of occurrence of the outcome Example I: Rolling a fair dice () 0 (=) =1 1 6 P X X 24 PMF 1 6 6 X i i 1

Probability Mass Function (PMF) (Cont d) Example II: Summation of two fair dices 1 6 P X X Note : Summation of all probabilities should be equal to ONE. (Why?) 25

Probability Density Function (PDF) Probability Density Function (PDF) Continuous random variables The probability of x x, x is ( ) 0 0 P X 1 px ( 0) lim Px ( 0 xx 0 ) 0 Px x X 26

Cumulative Distribution Function (CDF) Cumulative Distribution Function (CDF) Could be defined as the integration of PDF Similarly defined on discrete variables PDF X CDF x F x P X x x X. FX x px d Non-decreasing Right Continuous ( ) = 0 ( ) = 1 () = () P( u x v) F ( v) F ( u) X X X 27

Distribution statistics Most basic type of descriptor for spatial distributions, include: Mean Value Variance & Standard Deviation Covariance Moments 28

Expected value ExpectedValue (discrete variable) Expectation of function x. px E X x ExpectedValue (continous variable) Expectation of function ( ) ( ). E g X g x p x E X x. px x dx x E g X g x. px x dx 29

Moments Moments n th order moment of a random variable X Normalized form M E n X n E X X n Mean is the first order moment Variance is the second moment added by the square of the mean 30

Variance Variance Covariance of a random variable with itself Standard deviation Square root of variance Var 2 X E X X X 2 X E X E 2 2 2 X X X 31

Correlation & covariance Knowing about a random variable X, how much information will we gain about Y? Linear dependency Crr ( X, Y ) Covariance is normalized correlation E XY Cov ( X, Y ) E X X Y Y E XY X Y Cov ( X, Y ) x y X Y P ( x, y ) x y 32

Orthogonal, uncorrelated random vars Orthogonal random variables, = 0 Uncorrelated random variables, = 0 Independent random variables, = 0, = 0 Independent random variables 33

Covariance matrix: two variables 12 21 Cov ( X, Y ) 2 11 1 2 22 2 var( X ) var( Y ) 34

Covariance: example 35

Covariance properties The covariance matrix indicates the tendency of each pair of features to vary together If and tend to increase together, then >0 If tends to decrease when increases, then 0 If and are uncorrelated, then =0 36

Covariance matrix If is a vector of random variables ( -dim random vector): = = ( ) ( ) = (( )( )) (( )( )) (( )( )) (( )( )) 37

Covariance matrix If shows the covariance of and : ij Cov ( x i, x j ) E x i i. x j j Σ= 38

Sums of Random Variables z x y Variance: If x,y independent: () = () + () Distribution: 39 p ( z) p ( x, z x) dx z independence x, y p ( x) p ( y) p ( x) p ( z x) dx x y x y

Some famous masses and densities Uniform Density () P X ~(, ) 1. p x U end U begin a a 1 Gaussian (Normal) Density a Perhaps the most widely used distribution in all of science fully defined by 2 parameters P() X ~(, ) X 1 2 2. p x e N. 2 x 2, 1. 2 40 X

Some famous masses and densities Binomial Density n P X k.1 p. p k n k k ~(, ) Exponential Density x x. e x 0 px. e. U x 0 x 0 41

Gaussian (Normal) Density 42

Multivariate normal density is a vector of d Gaussian variables 1 ( ) 1 1 xμ T ( xμ) p() x N (, μ ) e 2 (2 ) d /2 1/2 μe[] x E( x ) E( x ) E[( xμ)( xμ) ] 1 T d T Mahalanobis Distance r 2 x μ x μ T 1 ( ) ( ) 43

Bivariate Normal Densities Level curves - ellipses Principal axes are the eigenvectors, and the width in these direction is the root of the corresponding eigenvalue. 44

Gaussian pdf properties It is closed under linear transformation The product of two Gaussians is also Gaussian All marginal and conditional densities of a Gaussian are also Gaussian Diagonalization of covariance matrix Rotated variables are independent Gaussian distribution is infinitely divisible 46

Central Limit Theorem Given a distribution with a mean and variance,the sampling distribution of the mean approaches a normal distribution with a mean ( ) and a variance as, the number of samples, increases. CLT s large sample properties : how large is large enough? Some books will tell you 30 is large enough. 47

Central limit theorem (loosely) Suppose i.i.d. (Independent Identically Distributed) RVs with finite variances Let S N N X i 1 i be the sum of these RVs PDF of converges to a normal distribution as increases, regardless to the distribution of RVs. Example: ~ uniform, =1,, = 48

Linear algebra Matrix A: Matrix Transpose A [ a ij ] mn a a... am 11 21 1 a a a 12 22... m2............ a a a 1n 2n... mn T B A b a 1 i n,1 j m Symetric matrix = ij ji Vector 49 a a1 T... a [ a1,..., an ] a n

Matrix and vector product Inner (dot) product Matrix multiplication n T ab i 1 ab A [ a ] B [ b ] ij mp ij pn i i AB C [ c ] c row ( A) col ( B) ij mn ij i j 50

Inner Product Inner (dot) product Length (Eucledian norm) of a vector a is normalized iff a = 1 Angle between vectors Geometric interpretation Orthogonal vectors and : n T ab i 1 ab n T a a i 1 Orthonormal set of vectors,,, :, = 1 = 0.. 51 a cos T ab i T ab a b 0 i a 2 i

Linear independence A set of vectors is linearly independent if no vector is a linear combination of other vectors. + +...+ = 0 = =...= = 0 52

Matrix determinant and trace Determinant () = () () Trace n det( A) a A ; i 1,... n; A ij j 1 i j ( 1) det( M ) ij ij ij tr[ A ] n a j 1 jj 53

Matrix inversion Inverse of : exists iff det () 0 (A is nonsingular) Singular: det () = 0 AB BA I B A ill-conditioned:a is nonsingular but close to being singular n 1 Pseudo-inverse for a non square matrix is not singular A T A # A A I A # [ A T A] 1 A T 54

Matrix rank : maximum number of linearly independent columns or rows of A. dimension of the largest square submatrix of A that has a nonzero determinant. 55

Matrix rank: properties : Full rank : iff is nonsingular ( ) 56

Eigenvectors and eigenvalues Characteristic equation: n-th order polynomial, with n roots det( A I n ) 0 n tr[ A] j1 j det[ A] n j1 j 57

Eigenvectors and eigenvalues Symetric =0 Orthonormal eigen-vectors Av v, j 1,..., n; v 1 j j j j 58

Eigen-decomposition: Symmetric matrix 59

Positive definite matrix Symmetric is positive definite: Eigen values of positive define matrix are positive, >0 60

Some vector derivation 61

Linear mapping Linear function ( + ) = () + (), () = (), Amatrix is a useful notion to encode linear maps 62