Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016

Similar documents
Random Variables and Their Distributions

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Chapter 5 continued. Chapter 5 sections

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

Introduction to Machine Learning

Introduction to Machine Learning

Lecture 1: August 28

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Lecture 2: Repetition of probability theory and statistics

Machine learning - HT Maximum Likelihood

Probability Distributions

2 Probability. 2.1 Introduction. Probability theory is nothing but common sense reduced to calculation. Pierre Laplace, 1812

Introduction to Probability and Statistics (Continued)

Multivariate Random Variable

Chapter 3.3 Continuous distributions

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Probability and Distributions

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

3. Probability and Statistics

Algorithms for Uncertainty Quantification

4. Distributions of Functions of Random Variables

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

STAT Chapter 5 Continuous Distributions

Chapter 5. Chapter 5 sections

1.12 Multivariate Random Variables

01 Probability Theory and Statistics Review

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Brief Review of Probability

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Multiple Random Variables

Chapter 4 Multiple Random Variables

Continuous Random Variables

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Continuous Random Variables

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Statistical Methods in Particle Physics

ECE 302 Division 2 Exam 2 Solutions, 11/4/2009.

Lecture 11. Multivariate Normal theory

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

BASICS OF PROBABILITY

MAS223 Statistical Inference and Modelling Exercises

Continuous random variables

Will Landau. Feb 28, 2013

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Lecture 17: The Exponential and Some Related Distributions

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators.

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Stochastic processes Lecture 1: Multiple Random Variables Ch. 5

Lecture 2: Review of Probability

18.440: Lecture 28 Lectures Review

EE4601 Communication Systems

8 - Continuous random vectors

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

This exam contains 6 questions. The questions are of equal weight. Print your name at the top of this page in the upper right hand corner.

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

Quick Tour of Basic Probability Theory and Linear Algebra

Chapter 3: Random Variables 1

15 Discrete Distributions

Introduction to Probability and Stocastic Processes - Part I

Stat 5101 Notes: Brand Name Distributions

Review of Probability Theory

2 Functions of random variables

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

Introduction to Normal Distribution

Lecture Note 1: Probability Theory and Statistics

2 (Statistics) Random variables

Random Vectors and Multivariate Normal Distributions

Properties of Summation Operator

ENGG2430A-Homework 2

SDS 321: Introduction to Probability and Statistics

Information geometry for bivariate distribution control

ECE353: Probability and Random Processes. Lecture 7 -Continuous Random Variable

A Few Special Distributions and Their Properties

Chapter 2. Continuous random variables

Actuarial models. Proof. We know that. which is. Furthermore, S X (x) = Edward Furman Risk theory / 72

Formulas for probability theory and linear models SF2941

Bivariate distributions

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

SOME SPECIFIC PROBABILITY DISTRIBUTIONS. 1 2πσ. 2 e 1 2 ( x µ

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Probability Theory Review Reading Assignments

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 60 minutes.

ECON 5350 Class Notes Review of Probability and Distribution Theory

Statistics for scientists and engineers

The Distributions of Sums, Products and Ratios of Inverted Bivariate Beta Distribution 1

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

Experimental Design and Statistics - AGA47A

1 Random Variable: Topics

Lecture 4. Continuous Random Variables and Transformations of Random Variables

STAT/MATH 395 PROBABILITY II

(3) Review of Probability. ST440/540: Applied Bayesian Statistics

Chapter 4. Continuous Random Variables 4.1 PDF

Continuous random variables

ECE Lecture #10 Overview

Transcription:

Lecture 3 Probability - Part 2 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 19, 2016 Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 1 / 46

Outline 1 Common Continuous Distributions - Univariate Gaussian Distribution Degenerate PDFs Student s t Distribution Laplace Distribution Gamma Distribution Beta Distribution Pareto Distribution 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF and Independence Covariance and Correlation Correlation and Independence Common Multivariate Distributions Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 2 / 46

Outline 1 Common Continuous Distributions - Univariate Gaussian Distribution Degenerate PDFs Student s t Distribution Laplace Distribution Gamma Distribution Beta Distribution Pareto Distribution 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF and Independence Covariance and Correlation Correlation and Independence Common Multivariate Distributions Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 3 / 46

Gaussian (Normal) Distribution X is a continuous RV with values x R X N (µ, σ 2 ), i.e. X has a Gaussian distribution or normal distribution mean E[X ] = µ mode µ N (x µ, σ 2 ) variance var[x ] = σ 2 1 2πσ 2 e 1 2σ 2 (x µ)2 precision λ = 1 σ 2 (µ 2σ, µ + 2σ) is the approx 95% interval (µ 3σ, µ + 3σ) is the approx. 99.7% interval (= P X (X = x)) Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 4 / 46

Gaussian (Normal) Distribution Normal, Bell-shaped Curve Percentage of cases in 8 portions of the curve 0.13% 2.14% 13.59% 34.13% 34.13% 13.59% 2.14% 0.13% Standard Deviations -4σ -3σ -2σ -1σ 0 +1σ +2σ +3σ +4σ Cumulative Percentages 0.1% 2.3% 15.9% 50% 84.1% 97.7% 99.9% Percentiles 1 5 10 20 30 40 50 60 70 80 90 95 99 Z scores -4.0-3.0-2.0-1.0 0 +1.0 +2.0 +3.0 +4.0 T scores 20 30 40 50 60 70 80 Standard Nine (Stanines) 1 2 3 4 5 6 7 8 9 Percentage in Stanine 4% 7% 12% 17% 20% 17% 12% 7% 4% Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 5 / 46

Gaussian (Normal) Distribution why Gaussian distribution is the most widely used in statistics? 1 easy to interpret: just two parameters θ = (µ, σ 2 ) 2 central limit theorem: the sum of independent random variables has an approximately Gaussian distribution 3 least number of assumptions (maximum entropy) subject to constraints of having mean = µ and variance = σ 2 4 simple mathematical form, easy to manipulate and implement homework: show that 1 2πσ 2 e 1 2σ 2 (x µ)2 dx = 1 Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 6 / 46

Outline 1 Common Continuous Distributions - Univariate Gaussian Distribution Degenerate PDFs Student s t Distribution Laplace Distribution Gamma Distribution Beta Distribution Pareto Distribution 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF and Independence Covariance and Correlation Correlation and Independence Common Multivariate Distributions Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 7 / 46

Degenerate PDFs X is a continuous RV with values x R consider a Gaussian distribution with σ 2 0 δ(x µ) lim N (x µ, σ 2 ) σ 2 0 δ(x) is the Dirac delta function, with { if x = 0 δ(x) = 0 if x 0 one has δ(x)dx = 1 sifting property f (x)δ(x µ)dx = f (µ) ( lim σ 2 0 N (x µ, σ 2 )dx = 1) Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 8 / 46

Outline 1 Common Continuous Distributions - Univariate Gaussian Distribution Degenerate PDFs Student s t Distribution Laplace Distribution Gamma Distribution Beta Distribution Pareto Distribution 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF and Independence Covariance and Correlation Correlation and Independence Common Multivariate Distributions Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 9 / 46

Student s t Distribution X is a continuous RV with values x R X T (µ, σ 2, ν), i.e. X has a Student s t distribution T (x µ, σ 2, ν) scale parameter σ 2 > 0 degrees of freedom ν [ 1 + 1 ν ( x µ mean E[X ] = µ defined if ν > 1 mode µ variance var[x ] = νσ2 defined if ν > 2 (ν 2) σ ) ] 2 ( ν+1 2 ) (= P X (X = x)) Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 10 / 46

Student s t Distribution a comparison of N (0, 1), T (0, 1, 1) and Lap(0, 1, 2) PDF log(pdf) Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 11 / 46

Student s t Distribution Pros why should we use the Student distribution? it is less sensitive to outliers than the Gaussian distribution without outliers with outliers Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 12 / 46

Outline 1 Common Continuous Distributions - Univariate Gaussian Distribution Degenerate PDFs Student s t Distribution Laplace Distribution Gamma Distribution Beta Distribution Pareto Distribution 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF and Independence Covariance and Correlation Correlation and Independence Common Multivariate Distributions Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 13 / 46

Laplace Distribution X is a continuous RV with values x R X Lap(µ, b), i.e. X has a Laplace distribution Lap(x µ, b) 1 ( ) 2b exp x µ b (= P X (X = x)) scale parameter b > 0 mean E[X ] = µ mode µ variance var[x ] = 2b 2 compared to Gaussian distribution, Laplace distribution is more rubust to outliers (see above) puts more probability density at µ (see above) Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 14 / 46

Outline 1 Common Continuous Distributions - Univariate Gaussian Distribution Degenerate PDFs Student s t Distribution Laplace Distribution Gamma Distribution Beta Distribution Pareto Distribution 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF and Independence Covariance and Correlation Correlation and Independence Common Multivariate Distributions Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 15 / 46

Gamma Distribution X is a continuous RV with values x R + (x > 0) X Ga(a, b), i.e. X has a gamma distribution Ga(x a, b) ba Γ(a) x a 1 e xb (= P X (X = x)) shape a > 0 rate b > 0 the gamma function is Γ(x) u x 1 e u du where Γ(x) = (x 1)! for x N and Γ(1) = 1 mean E[X ] = a b mode a 1 b variance var[x ] = a b 2 N.B.: there are several distributions which are just special cases of the Gamma (e.g. exponential, Erlang, Chi-squared) Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 16 / 46

Gamma Distribution some Ga(a, b = 1) distributions right: an empirical PDF of some rainfall data Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 17 / 46

Inverse Gamma Distribution X is a continuous RV with values x R + (x > 0) if X Ga(a, b), i.e. 1 X IG(a, b) IG(a, b) is the inverse gamma distribution IG(x a, b) ba Γ(a) x (a+1) e b/x (= P X (X = x)) mean E[X ] = mode b a+1 b a 1 variance var[x ] = (defined if a > 1) b 2 (a 1) 2 (a 2) (defined if a > 2) Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 18 / 46

Outline 1 Common Continuous Distributions - Univariate Gaussian Distribution Degenerate PDFs Student s t Distribution Laplace Distribution Gamma Distribution Beta Distribution Pareto Distribution 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF and Independence Covariance and Correlation Correlation and Independence Common Multivariate Distributions Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 19 / 46

Beta Distribution X is a continuous RV with values x [0, 1] X Beta(a, b), i.e. X has a beta distribution Beta(x a, b) = 1 B(a, b) x a 1 (1 x) b 1 (= P X (X = x)) requirements: a > 0 and b > 0 the beta function is mean E[X ] = mode a 1 a+b 2 a a+b variance var[x ] = ab (a+b) 2 (a+b+1) B(a, b) Γ(a)Γ(b) Γ(a + b) Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 20 / 46

Beta Distribution beta distribution Beta(x a, b) = requirements: a > 0 and b > 0 1 B(a, b) x a 1 (1 x) b 1 (= P X (X = x)) if a = b = 1 then Beta(x 1, 1) = Unif (x 1, 1) in the interval [0, 1] R this distribution can be used to represent a prior on a probability value to be estimated Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 21 / 46

Outline 1 Common Continuous Distributions - Univariate Gaussian Distribution Degenerate PDFs Student s t Distribution Laplace Distribution Gamma Distribution Beta Distribution Pareto Distribution 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF and Independence Covariance and Correlation Correlation and Independence Common Multivariate Distributions Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 22 / 46

Pareto Distribution X is a continuous RV with values x R + (x > 0) X Pareto(k, m), i.e. X has a Pareto distribution Pareto(x k, m) = km k x (k+1) I(x m) as k the distribution approaches δ(x) mean E[X ] = km k 1 (defined for k > 1) mode m variance var[x ] = m 2 k (k 1) 2 (k 2) this distribution is particular useful for its long tail (= P X (X = x)) Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 23 / 46

Outline 1 Common Continuous Distributions - Univariate Gaussian Distribution Degenerate PDFs Student s t Distribution Laplace Distribution Gamma Distribution Beta Distribution Pareto Distribution 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF and Independence Covariance and Correlation Correlation and Independence Common Multivariate Distributions Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 24 / 46

Joint Probability Distributions consider an ensemble of RVs X 1,..., X D we can define a new RV X (X 1,..., X D ) T we are now interested in modeling the stochastic relationship between X 1,..., X D in this case x = (x 1,..., x D ) T R D denotes a particular value (instance) of X Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 25 / 46

Outline 1 Common Continuous Distributions - Univariate Gaussian Distribution Degenerate PDFs Student s t Distribution Laplace Distribution Gamma Distribution Beta Distribution Pareto Distribution 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF and Independence Covariance and Correlation Correlation and Independence Common Multivariate Distributions Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 26 / 46

Joint Cumulative Distribution Function Definition given a continuous RV X with values x R D Cumulative Distribution Function (CDF) F (x) = F (x 1,..., x D ) P X (X x) = P X (X 1 x 1,..., X D x D ) properties: 0 F (x) 1 F (x 1,..., x j,..., x D ) F (x 1,..., x j + x j,..., x D ) with x j > 0 lim xj 0 + F (x1,..., x j + x j,..., x D ) = F (x 1,..., x j,..., x D ) (right-continuity) F (,..., ) = 0 F (+,..., + ) = 1 Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 27 / 46

Joint Probability Density Function Definitions given a continuous RV X with values x R D Probability Density Function (PDF) D F p(x) = p(x 1,..., x D ) x 1,..., x D we assume the above partial derivative of F exists properties: F (x) = P X (X x) = x 1... x D p(ξ1,..., ξ D)dξ 1...dξ D P X (x < X x + dx) p(x)dx 1...dx D = p(x)dx P X (a < X b) = b 1 a 1... b D p(x)dx a D 1...dx D N.B.: p(x) acts as a density in the above computations Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 28 / 46

Joint Probability Density Function Definitions for a discrete RV X we have instead Probability Mass Function (PMF): p(x) P X (X = x) in the above properties we can remove dx and replace integrals with sums the CDF can be defined as F (x) P X (X x) = p(ξ i ) ξi x Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 29 / 46

Joint PDF Some Properties reconsider 1 F (x) = P X (X x) = x 1... x D p(ξ1,..., ξ D)dξ 1...dξ D 2 P X (x < X x + dx) p(x)dx 1...dx D = p(x)dx the first implies... p(x)dx = 1 the second implies p(x) 0 for all x R D it is possible that p(x) > 1 for some x R D (consider (x1,..., x D) (,..., ))) Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 30 / 46

Outline 1 Common Continuous Distributions - Univariate Gaussian Distribution Degenerate PDFs Student s t Distribution Laplace Distribution Gamma Distribution Beta Distribution Pareto Distribution 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF and Independence Covariance and Correlation Correlation and Independence Common Multivariate Distributions Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 31 / 46

Joint PDF Marginal PDF suppose we want the PDF of Q (X 1, X 2,..., X D 1 ) T (we don t care about X D ) F Q (q) = P Q (Q q) = P X (X (q, ) T ) (X D can take any value in (, ) ) P X (X (q, )) = x 1... x D 1 p X(x)dx 1...dx D 1 dx D = x1... x D 1 ( p ) X(x)dx D dx1...dx D 1 hence we can define the marginal PDF and one has p Q (q) = p Q (x 1,..., x D 1 ) F Q (q) = F Q (x 1,..., x D 1 ) = x1 p X (x 1,..., x D )dx D xd 1... p Q (q)dx 1...dx D 1 the above procedure can be also used to marginalize more variables the above procedure can be used for obtaining a marginal PMF for discrete variables by removing the dx and replacing integrals with sums, i.e. p Q (q) = p Q (x 1, x 2,..., x D 1 ) x D p X (x 1, x 2,..., x D ) Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 32 / 46

Outline 1 Common Continuous Distributions - Univariate Gaussian Distribution Degenerate PDFs Student s t Distribution Laplace Distribution Gamma Distribution Beta Distribution Pareto Distribution 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF and Independence Covariance and Correlation Correlation and Independence Common Multivariate Distributions Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 33 / 46

Joint PDF Conditional PDF and Independence suppose we want the PDF of Q (X 1, X 2,..., X D 1 ) T given X D = x D P Q XD (q < Q q + dq x D < X D x D + dx D ) = P X (x < X x + dx) p X (x)dx 1...dx D P XD (x D < X D x D + dx D ) p XD (x D )dx D hence P Q XD (q < Q q + dq x D < X D x D + dx D ) we can define the conditional PDF p Q XD (q x D ) = p Q XD (x 1,..., x D 1 x D ) Q and X D are independent p X (x) = p Q (q)p xd (x D ) if Q and X D are independent then p Q XD (q x D ) = p Q (q) P X (x < X x + dx) P xd (x D < X D x D + dx D ) p X(x) p XD (x D ) dx1...dx D 1 p X(x) p XD (x D ) the above definitions can be generalized to define the conditional PDF w.r.t. more variables Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 34 / 46

Outline 1 Common Continuous Distributions - Univariate Gaussian Distribution Degenerate PDFs Student s t Distribution Laplace Distribution Gamma Distribution Beta Distribution Pareto Distribution 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF and Independence Covariance and Correlation Correlation and Independence Common Multivariate Distributions Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 35 / 46

Covariance covariance of two RVs X and Y cov[x, Y ] E[(X E[X ])(Y E[Y ])] = E[XY ] E[X ]E[Y ] (= cov[y, X ]) if x R d, the mean value is E[x] if x R d, the covariance matrix is... xp(x)dx = E[x 1]. E[x D ] cov[x] E[(x E[x])(x E[x]) T ] = R D var[x 1] cov[x 1, X 2]... cov[x 1, X d ] cov[x 2, X 1] var[x 2]... cov[x 2, X d ] =........ RD D cov[x d, X 1] cov[x d, X 2]... var[x d ] N.B.: cov[x] = cov[x] T and cov[x] 0 Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 36 / 46

Correlation correlation coefficient of two RVs X and Y corr[x, Y ] cov[x, Y ] var[x ] var[y ] it can be to shown that 0 corr[x, Y ] 1 (homework 1 ) corr[x, Y ] = 0 cov[x, Y ] = 0 if x R d, its correlation matrix is corr[x 1, X 1] corr[x 1, X 2]... corr[x 1, X d ] corr[x 2, X 1] corr[x 2, X 2]... corr[x 2, X d ] corr[x]........ RD D corr[x d, X 1] corr[x d, X 2]... corr[x d, X d ] N.B.: corr[x] = corr[x] T 1 use the fact that ( f (t)g(t)dt )2 f 2 dt g 2 dt Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 37 / 46

Outline 1 Common Continuous Distributions - Univariate Gaussian Distribution Degenerate PDFs Student s t Distribution Laplace Distribution Gamma Distribution Beta Distribution Pareto Distribution 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF and Independence Covariance and Correlation Correlation and Independence Common Multivariate Distributions Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 38 / 46

Correlation and Independence Property 1: there is a linear relationship between X and Y iff corr[x, Y ] = 1, i.e. corr[x, Y ] = 1 Y = ax + b the correlation coefficient represents a degree of linear relationship Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 39 / 46

Correlation and Independence Property 2: if X and Y are independent (p(x, Y ) = p(x )p(y )) then cov[x, Y ] = 0 and corr[x, Y ] = 0, i.e. (homework) Property 3: X Y = corr[x, Y ] = 0 corr[x, Y ] = 0 = X Y example: with X U( 1, 1) and Y = X 2 (quadratic dependency) one has corr[x, Y ] = 0 (homework) other examples where corr[x, Y ] = 0 but there is a cleare dependence between X and Y a more general measure of dependence is the mutual information Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 40 / 46

Outline 1 Common Continuous Distributions - Univariate Gaussian Distribution Degenerate PDFs Student s t Distribution Laplace Distribution Gamma Distribution Beta Distribution Pareto Distribution 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF and Independence Covariance and Correlation Correlation and Independence Common Multivariate Distributions Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 41 / 46

Multivariate Gaussian (Normal) Distribution X is a continuous RV with values x R D X N (µ, Σ), i.e. X has a Multivariate Normal distribution (MVN) or multivariate Gaussian [ 1 N (x µ, Σ) exp 1 ] (2π) D/2 Σ 1/2 2 (x µ)t Σ 1 (x µ) mean: E[x] = µ mode: µ covariance matrix: cov[x] = Σ R D D where Σ = Σ T and Σ 0 precision matrix: Λ Σ 1 spherical isotropic covariance with Σ = σ 2 I D Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 42 / 46

Multivariate Gaussian (Normal) Distribution Bivariate Normal Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 43 / 46

Multivariate Student t Distribution X is a continuous RV with values x R D X T (µ, Σ, ν), i.e. X has a Multivariate Student t distribution T (x µ, Σ, ν) mean: E[x] = µ mode: µ Γ(ν/2 + D/2) Γ(ν/2) Σ 1/2 ν D/2 π D/2 Σ = Σ T is now called the scale matrix covariance matrix: cov[x] = ν ν 2 Σ [ 1 + 1 ν (x µ)t Σ 1 (x µ) ] ( ν+d ) 2 N.B.: this distribution is similar to MVN but it s more robust w.r.t outliers due to its fatter tails (see the previous slides about univariate Student t distribution) Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 44 / 46

Dirichlet Distribution X is a continuous RV with values x S K probability simplex S K {x R K : 0 x i 1, K x i = 1} the vector x = (x 1,..., x K ) can be used to represent a set of K probabilities X Dir(α), i.e. X has a Dirichlet distribution i=1 i=1 Dir(x α) 1 K x α i 1 i I(x S K ) B(α) where α R K and B(α) is a generalization of the beta function to K variables 2 K i=1 B(α) = B(α 1,..., α K ) Γ(α i) Γ(α 0) α 0 = K i=1 α i E[x k ] = α k α 0, mode[x k ] = α k 1 α 0 K, var[x k] = α k (α 0 α K ) α 2 0 (α 0+1) N.B.: this distribution is a multivariate generalization of the beta distribution 2 see the slide about the gamma distribution for the definition of Γ(α) Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 45 / 46

Credits Kevin Murphy s book Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 46 / 46