Multivariate probability distributions and linear regression

Similar documents
CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

Recitation 2: Probability

Introduction to Machine Learning

Some Concepts of Probability (Review) Volker Tresp Summer 2018

More than one variable

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

1 Random variables and distributions

Review of Probability. CS1538: Introduction to Simulations

1 Presessional Probability

Brief Review of Probability

Solutions to Homework Set #6 (Prepared by Lele Wang)

18.440: Lecture 19 Normal random variables

Introduction to Machine Learning

18.440: Lecture 26 Conditional expectation

ENGG2430A-Homework 2

Review of Probability Theory

Jointly Distributed Random Variables

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Notes for Math 324, Part 19

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

01 Probability Theory and Statistics Review

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

Chapter 4. Chapter 4 sections

Lecture 4: Least Squares (LS) Estimation

Lecture 2: Repetition of probability theory and statistics

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Lecture 2: Review of Probability

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Bivariate distributions

MAS223 Statistical Inference and Modelling Exercises

Chapter 5 continued. Chapter 5 sections

Review of Basic Probability Theory

Expectation and Variance

Random Variables and Their Distributions

Random Variables and Expectations

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49

ACM 116: Lectures 3 4

Dept. of Linguistics, Indiana University Fall 2015

Covariance and Correlation

Lecture 1: August 28

HW5 Solutions. (a) (8 pts.) Show that if two random variables X and Y are independent, then E[XY ] = E[X]E[Y ] xy p X,Y (x, y)

V7 Foundations of Probability Theory

FINAL EXAM: Monday 8-10am

Notes on Mathematics Groups

Continuous Random Variables

M378K In-Class Assignment #1

Review of probability. Nuno Vasconcelos UCSD

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Introduction to Probability and Stocastic Processes - Part I

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Algorithms for Uncertainty Quantification

ECE 450 Homework #3. 1. Given the joint density function f XY (x,y) = 0.5 1<x<2, 2<y< <x<4, 2<y<3 0 else

Appendix A : Introduction to Probability and stochastic processes

Properties of Random Variables

ECE 541 Stochastic Signals and Systems Problem Set 9 Solutions

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions

6.041/6.431 Fall 2010 Quiz 2 Solutions

1 Basic continuous random variable problems

. Find E(V ) and var(v ).

Econ 371 Problem Set #1 Answer Sheet

Review of probability

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

Random variables (discrete)

Notes 12 Autumn 2005

Joint Distribution of Two or More Random Variables

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

Math 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is

Probability Review. Chao Lan

Discrete Random Variables

Chapter 5. Chapter 5 sections

Multivariate Random Variable

STAT 516 Midterm Exam 3 Friday, April 18, 2008

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Exercises with solutions (Set D)

Lecture 1: Bayesian Framework Basics

BASICS OF PROBABILITY

Homework 5 Solutions

1.1 Review of Probability Theory

3. Probability and Statistics

ECE Homework Set 3

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

CS 630 Basic Probability and Information Theory. Tim Campbell

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

ECON Fundamentals of Probability

MATH 151, FINAL EXAM Winter Quarter, 21 March, 2014

CME 106: Review Probability theory

Bivariate Distributions. Discrete Bivariate Distribution Example

Machine Learning Lecture Notes

Chapter 4 continued. Chapter 4 sections

Gaussian random variables inr n

ECE531: Principles of Detection and Estimation Course Introduction

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Lecture 16 : Independence, Covariance and Correlation of Discrete Random Variables

18.440: Lecture 28 Lectures Review

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Review of Probabilities and Basic Statistics

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

Transcription:

Multivariate probability distributions and linear regression Patrik Hoyer 1

Contents: Random variable, probability distribution Joint distribution Marginal distribution Conditional distribution Independence, conditional independence Generating data Multivariate Gaussian distribution Multivariate linear regression Expectation, variance, covariance, correlation Estimating a distribution from sample data Patrik Hoyer 2

Random variable - sample space (set of possible elementary outcomes) - probability distribution over sample space Examples: - The throw of a die x P (x) 1 2 3 4 5 6 1/6 1/6 1/6 1/6 1/6 1/6 - The sum of two dice x P (x) 2 3 4 5 6 7 8 9 10 11 12 1/36 1/18 1/12 1/9 5/36 1/6 5/36 1/9 1/12 1/18 1/36 - Two separate dice (red, blue) x P (x) (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (2,1) (2,2) (2,3)... (6,6) 1/36 1/36 1/36 1/36 1/36 1/36 1/36 1/36 1/36 1/36 Patrik Hoyer 3

Discrete variables: - Finite number of states (e.g. dice examples) - Infinite number of states (e.g. how many heads before one tales in a sequence of coin tosses?) Continuous variables: Each particular state has a probability of zero, so we need the concept of a probability density: P (X x) = x p(t) dt (e.g. how long time until next bus arrives? what will be the price of oil a year from now?) Patrik Hoyer 4

A probability distribution satisfies... 1. Probabilities are non-negative: P (X = x) = P X (x) = P (x) 0 2. Sum to one: P (x) = 1 x p(x) dx = 1 (discrete) (continuous) [Note that in the discrete case this means that there exists no value of x such that P (x) > 1. However this does not in general hold for a continuous density p(x)!] Patrik Hoyer 5

The joint distribution of two random variables: - Let X and Y be random variables. Their joint distribution is P (x, y) = P (X = x and Y = y) - Example: Two coin tosses, X denotes first throw, Y denotes second (note: independence!) P (x, y) : X H T H Y T 0.25 0.25 0.25 0.25 - Example: X: Rain today? Y : Rain tomorrow? P (x, y) : X Y N Y Y N 0.5 0.2 0.1 0.2 Patrik Hoyer 6

Marginal distribution: - Interested in or observing only one of the two variables - The distribution is obtained by summing (or integrating) over the other variable: P (x) = y P (x, y) p(x) = - Example (continued): What is the probability of rain tomorrow? That is, what is P (y)? p(x, y) dy X Y N Y Y N 0.5 0.2 0.1 0.2 In the same fashion, we can calculate that the chance of rain today is 0.7. P (y) : 0.6 0.4 Patrik Hoyer 7

Conditional distribution: - If we observe X = x how does that affect our belief about the value of Y? - Obtained by selecting the appropriate row/column of the joint distribution, and renormalizing it to sum to one: P (x, y) p(x, y) P (y X = x) = P (y x) = p(y x) = P (x) p(x) - Example (continued): What is the probability that tomorrow rains, given that today does not rain? i.e. what is P (y X = no rain )? Y Y N P (y X = no rain ) X Y N 0.5 0.2 0.1 0.2 0.1 0.2 / (0.1 + 0.2) 0.33 0.67 Y N Patrik Hoyer 8

Chain rule: P (x, y) = P (x)p (y x) = P (y)p (x y) p(x, y) = p(x)p(y x) = p(y)p(x y) So the joint distribution can be specified directly, or using the marginal and conditional distribution (can even choose, which way one specifies it) Patrik Hoyer 9

Independence: Two random variables are independent, if and only if knowing the value of one does not change our belief about the second: x : P (y x) = P (y) y : P (x y) = P (x) This is equivalent to being able to write the joint distribution as the product of the marginals: P (x, y) = P (x)p (y) We write this as: X Y or, if we want to explicitly specify the distribution: (X Y ) P Example: Two coin tosses... Patrik Hoyer 10

Three or more variables: - joint distribution: ( multidimensional array/function ) - marginal distributions: (e.g.) P (x) = P (x, y) = v,w,y,z,... v,w,z,... P (v, w, x, y, z,...) P (v, w, x, y, z,...) P (v, w, x, y, z,...) - conditional distributions: (e.g.) P (x v, w, y, z,...) = P (v, w, x, y, z,...)/p (v, w, y, z,...) P (x, y v, w, z,...) = P (v, w, x, y, z,...)/p (v, w, z,...) P (v, w, y, z,... x) = P (v, w, x, y, z,...)/p (x) P (x y) = P (v, w, x, z,... y) v,w,z,... marginal and conditional Patrik Hoyer 11

- Chain rule P (v, w, x, y, z,...) = P (v)p (w v)p (x v, w)p (y v, w, x) P (z v, w, x, y)p (... v, w, x, y, z) - Complete independence between all variables if and only if: P (v, w, x, y, z,...) = P (v)p (w)p (x)p (y)p (z)p (...) - Conditional independence (e.g: if we know the value of z then does not give any additional information about y ): P (x, y z) = P (x z)p (y z) x This is also written: X Y Z or explicitly noting the distribution: (X Y Z) P Patrik Hoyer 12

- In general we can say that marginal distributions are conditional on not knowing the value of other variables: P (x) = P (x ) and (marginal) independence is independence conditional on not observing other variables: P (x, y ) = P (x )P (y ) - Example of conditional independence: Drownings and ice-cream sales. These are mutually dependent (both happen during warm weather) but are, at least approximately, conditionally independent given the weather Patrik Hoyer 13

Example: conditional dependence: Two coin tosses and a bell that rings whenever they get the same result. The coins are marginally independent but conditionally dependent given the bell! X : Y : Z : First coin toss Second coin toss Bell H Y T P (x, y) = X H T 0.25 0.25 0.25 0.25 (independent) H Y T P (x, y Z = bell rang ) = X H T 0.5 0 0 0.5 (dependent!) Patrik Hoyer 14

Data generation, sampling - Given some P (x), how can we draw samples (generate data) from that distribution? Answer: Divide the unit interval [0,1] into parts corresponding to the probabilities, draw a uniformly distributed number in the interval, and select the state into which we fell: 0 1 0.30245... X := x 2 P (x 1 ) P (x 2 ) P (x 4 ) P (x 6 ) P (x 3 ) P (x 5 ) Patrik Hoyer 15

Given a joint distribution P (x, y, z) (generate data)? - We could list all joint states, then proceed as above, or... - Draw data sequentially from conditional distributions: 1. First draw x from 2. Next y from 3. Finally from P (x) P (y x) z P (z x, y), how can we draw samples Note: We can freely choose any ordering of the variables! Patrik Hoyer 16

Example (continued): Two coin tosses and a bell that rings if and only if the two tosses give the same result - can draw all the variables simultaneously by listing all the joint states, calculating their probabilities, placing them on the unit interval, and then draw the joint state - can first independently generate the coin tosses, then assign the bell - can first draw one coin toss and the bell, and then assign the second coin toss Patrik Hoyer 17

Numerical random variables - Expectation: E{X} = x xp (x) (discrete) E{X} = x p(x) dx (continuous) - Variance: Var(X) = σx 2 = σ XX = E{(X E{X}) 2 } - Covariance: Cov(X, Y ) = σ XY = E{(X E{X})(Y E{Y })} - Correlation coefficient: ρ XY = σ XY σ 2 X σ 2 Y Patrik Hoyer 18

- Multivariate numerical random variables... (random vectors) Expectation: E{V} = E{V 1 } E{V 2 }. E{V N } Covariance matrix ( variance-covariance matrix ) C V = Σ V = E{(V E{V})(V E{V}) T } = Var(V 1 ) Cov(V 1, V N )... Cov(V N, V 1 ) Var(V 2 ) = σ V1 V 1 σ VN V 1... σ V1 V N σ VN V N Patrik Hoyer 19

Conditional expectation, variance, covariance, correlation - Conditional expectation (note: function of y!) E{X Y = y} = x xp (x y) (discrete) E{X Y = y} = x p(x y) dx (continuous) - Conditional variance (note: function of y!) Var(X Y = y) = σx y 2 = σ XX y = E{(X E{X}) 2 } P (X Y =y) - Conditional covariance (note: function of z!) Cov(X, Y z) = σ XY z = E{(X E{X})(Y E{Y })} P (X,Y Z=z) - Conditional correlation coefficient (note: function of z!) ρ XY z = σ XY z σ 2 X z σ2 Y z Patrik Hoyer 20

- Multivariate Gaussian ( normal ) density: p(x) = N (µ, Σ) = (2π) d/2 Σ 1/2 exp 1 2 (x µ)t Σ 1 (x µ) has the following properties: mean vector (µ, and covariance matrix Σ) as the only parameters x 2 x 1 Patrik Hoyer 21

all marginal and conditional distributions are also Gaussian, and the conditional (co)variances do not depend on the values of the conditioning variables: Let x and y be random vectors whose dimensions are n and m. If they are joined together into one random vector z = (x T, y T ), with dimension n + m, then its mean m z and covariance matrix C z are mx Cx C m z =, C z = xy, (1) C yx C y m y where m x and m y are the means of x and y, and C x and C y are the covariance matrices of x and y respectively, and C xy contains the cross covariances. If z is multivariate Gaussian then x and y are also Gaussian. Additionally the conditional distributions p(x y) and p(y x) are Gaussian. The latter s mean and covariance matrix are m y x = m y + C yx C 1 x (x m x ) (2) C y x = C y C yx C 1 x C xy (3) Patrik Hoyer 22

The conditional variance, conditional covariance, and conditional correlation coefficient, for the Gaussian distribution, are known as partial variance σx Z 2, partial covariance σ XY Z, and partial correlation coefficient ρ XY Z (respectively) These can of course always be computed directly from the covariance matrix (regardless of whether the distribution actually is Gaussian!)......but they can only be safely interpreted as conditional variance, conditional covariance, and conditional correlation coefficient (respectively) for the Gaussian distribution. Patrik Hoyer 23

for Gaussian: zero (partial) covariance zero (conditional) covariance (conditional) independence i.e. (σ XY Z = 0) ( z : σ XY z = 0) (X Y Z) in general: we only have one-way implication: zero (conditional) covariance (conditional) independence i.e. ( z : σ XY z = 0) (X Y Z) Note, however, that conditional independence does not imply zero partial covariance in the completely general case! Patrik Hoyer 24

Linear regression: ŷ = r yx x + y y Fit a line through the data, explaining how y varies with x. Minimize sum of squares error between ŷ and y. r yx = σ XY σx 2 Probabilistic interpretation: ŷ E{Y X = x} (note that this is true only for roughly linear relationships) x Patrik Hoyer 25

Note the symmetry: We could equally well regress x on y! ˆx = r xy y + x x y Patrik Hoyer 26

Multivariate linear regression: ẑ = ax + by + z a = r zx y = σ ZX Y σ 2 X Y x z y Note that the partial regression coefficient r zx y is NOT the same, in general, as one gets from regressing z on x, ignoring y : Note also that r zx y is derived from the partial (co)variances. This holds regardless of the form of the underlying distribution. r zx Patrik Hoyer 27