The Multivariate Gaussian Distribution [DRAFT]

Size: px
Start display at page:

Download "The Multivariate Gaussian Distribution [DRAFT]"

Transcription

1 The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs, because I don t know how to do it without either being very tedious and technical, or using some mathematics that are beyond the prerequisites. To my knowledge, there are two primary approaches to developing the theory of multivariate Gaussian distributions. The first, and by far the most common approach in machine learning textbooks, is to define the multivariate gaussian distribution in terms of its density function, and to derive results by manipulating these density functions. With this approach, a lot of the work turns out to be elaborate matrix algebra calculations happening inside the exponent of the Gaussian density. One issue with this approach is that the multivariate Gaussian density is only defined when the covariance matrix is invertible. To keep the derivations rigorous, some care must be taken to justify that the new covariance matrices we come up with are invertible. For my taste, I find the rigor in our textbooks to be a bit light on these points. We ve included the proof to Theorem 4 to give a flavor of the details one should add. The second major approach to multivariate Gaussian distributions does not use density functions at all and does not require invertible covariance matrices. This approach is much cleaner and more elegant, but it relies on the theory of characteristic functions and the Cramer-Wold device to get started, and these are beyond the prerequisites for this course. You can often find this development in more advanced probability and statistics books, such as Rao s excellent Linear Statistical Inference and Its Applications Chapter 8. 1

2 2 2 Recognizing a Gaussian Density 1 Multivariate Gaussian Density A random vector x R d has a d-dimensional multivariate Gaussian distribution with mean µ R d and covariance matrix Σ R d d if its density is given by N x µ, Σ = 1 exp 1 2π d/2 Σ 1/2 2 x µt Σ 1 x µ, where Σ denotes the determinant of Σ. Note that this expression requires that the covariance matrix Σ be invertible 1. Sometimes we will rewrite the factor in front of the exp as 2πΣ 1/2, which follows from basic facts about determinants. Exercise 1. There are at least 2 claims implicit in this definition. First, that the expression given is, in fact, a density i.e. it s non-negative and integrates to 1. Second, the density corresponds to a distribution with mean µ and covariance Σ, as claimed. 2 Recognizing a Gaussian Density If we come across a density function of the form px e qx/2, where qx is a positive definite quadratic function, then px is the density for a Gaussian distribution. More precisely, we have the following theorem: Theorem 2. Consider the quadratic function qx = x T Λx 2b T x + c, for any symmetric positive definite Λ R d d, any b R d, and c R. If px is a density function with px e qx/2, then px is a multivariate Gaussian density with mean Λ 1 b and covariance Λ 1. That is, px = Λ 1/2 exp 1 2π d/2 2 x Λ 1 b T Λx Λ 1 b. 1 We can have a d-dimensional Gaussian distribution with a non-invertible Σ, but such a distribution will not have a density on R d, and we will not address that case here.

3 3 Note: The inverse of the covariance matrix is called the precision matrix. Precision matrices of multivariate Gaussians have some interesting properties. explain that this is the Gaussian density in information form or canonical form c.f. Murphy p Proof. Completing the square, we have qx = x T Λx 2b T x + c = x Λ 1 b T Λx Λ 1 b b T Λ 1 b + c. Since the last two terms are independent of x, when we exponentiate qx, they can be absorbed into the constant of proportionality. That is, e qx/2 = exp 1 x Λ 1 b T Λx Λ 1 b exp 1 b T Λ 1 b + c 2 2 exp 1 x Λ 1 b T Λx Λ 1 b 2 Now recall that the density function for the multivariate Gaussian density N µ, Σ is 1 φx = exp 1 2π d/2 Σ 1/2 2 x µt Σ 1 x µ. Thus we see that px must also be a Gaussian density with covariance Σ = Λ 1 and mean Λ 1 b. 3 Conditional Distributions Bishop Section Let x R d have a Gaussian distribution: x N µ, Σ. Let s partition the random variables in x into two pieces: x =, where x 1 R d 1, R d 2 and d = d 1 + d 2. Similarly, we ll partition the mean vector, the covariance matrix, and the precision matrix as µ1 Σ11 Σ µ = Σ = 12 Λ = Σ 1 Λ11 Λ = 12, µ 2 Σ 21 Σ 22 Λ 21 Λ 22

4 4 3 Conditional Distributions Bishop Section where µ 1 R d 1, Σ 12 R d 1 d 2, Λ 12 R d 1 d 2, etc. Note that by the symmetry of the covariance matrix Σ, we have Σ 12 = Σ T 21. Note also that Σ 11 and Σ 22 are spd since Σ is trivial from definition of spd. When x = has a Gaussian distribution, we say that x 1 and are jointly Gaussian. Can we conclude anything about the marginal distributions of x 1 and? Indeed, the following theorem states that they are individually Gaussian: Theorem 3.. Let x, µ, and Σ be as defined above. Then the marginal distributions of x 1 and are each Gaussian, with x 1 N µ 1, Σ 1 N µ 2, Σ 2. Proof. See Bishop Section 2.3.2, p. 88 This can be done by showing that the marginal density px 1 = px 1, d has the form claimed, and similarly for. So when x 1 and are jointly Gaussian, we know that x 1 and are also marginally Gaussian. It turns out that the conditional distributions x 1 and x 1 are also Gaussian: Theorem 4. Let x, µ, and Σ be as defined above. Assume that Σ 22 is positive definite 2. Then the distribution of x 1 given is multivariate normal. More specifically, x 1 N µ 1 2, Σ 1 2, where µ 1 2 = µ 1 + Σ 12 Σ 1 22 µ 2 Σ 1 2 = Σ 11 Σ 12 Σ 1 22 Σ 21. Note that Σ 22 is positive definite, and thus invertible, since Σ is positive definite. Proof. See Bishop Section 2.3.1, p In fact, this is implied by our assumption that Σ is positive definite.

5 5 Example. Consider a standard regression framework in which we are building a predictive model for x 1 R given R d. Recall that if we are using a square loss, then the Bayes optimal prediction function is f = E x 1. If we assume that x 1 and are jointly Gaussian with a positive definite covariance matrix, then Theorem 4 tells us that E x 1 = µ 1 + Σ 12 Σ 1 22 µ 2. Of course, in practice we don t know µ and Σ. Nevertheless, what s interesting is that the Bayes optimal prediction function is an affine function of i.e. a linear function plus a constant. Thus if we think that our input vector and our response variable x 1 are jointly Gaussian, there s no reason to go beyond a hypothesis space of affine functions of. In other words, linear regression is all we need. 4 Joint Distribution from Marginal + Conditional In Section 3, we found that if x 1 and are jointly Gaussian, then is marginally Gaussian and the conditional distribution x 1 was also Gaussian, where the mean is a linear function of. The following theorem shows that we can we can go in the reverse direction as well. Theorem. Suppose x 1 N µ 1, Σ 1 and x 1 N Ax 1 + b, Σ 2 1, for some µ 1 R d 1, Σ 1 R d 1 d 1, A R d 2 d 1, and Σ 2 1 R d 2 d 2. Then x 1 and are jointly Gaussian with x = N µ1 Aµ 1 + b Σ1 Σ, 1 A T AΣ 1 Σ AΣ 1 A T. We ll prove this with two steps. First, we ll show that the mean and variance of x take the form claimed above. Then, we ll write down the joint density px 1, = px 1 p x 1 and show that it s proportional to e qx/2 for an appropriate quadratic qx. The result then follows from 2. Proof. We re given that Ex 1 = µ 1. For the other part of the mean vector, note that E = EE x 1 = E Ax 1 + b = Aµ 1 + b,

6 6 4 Joint Distribution from Marginal + Conditional which explains the lower entry in the mean. We are given that the marginal covariance of x 1 is Σ 1. That is, E x 1 µ 1 x 1 µ 1 T = Σ 1. We re also given the conditional covariance of : E Ax 1 b Ax 1 b T x 1 = Σ 2 1. We ll now try to express Cov in terms of these expressions above. For convenience, we ll introduce the random variable m 2 1 = Ax 1 +b. It s random because it depends on x 1. Note that Em 2 1 = E = Aµ 1 + b. So Cov = E E E T by definition = EE E E T x 1 law of iterated expectations = EE m m 2 1 E m m 2 1 E x 1 }{{}}{{} =0 =0 x2 T = EE m m2 1 E x2 m m2 1 E = U + 2V + W, where we ve multiplied out the parenthesized terms. The terms are as follows: x2 T U = EE m 2 1 x2 m 2 1 = Σ 2 1 The cross-term turns out to be zero: x2 T V = EE m 2 1 m2 1 E EE Ax 1 b Ax 1 + b Aµ 1 b T x 1 = E E Ax 1 + b x 1 Ax }{{} 1 + b Aµ 1 b T =0 = 0, T

7 7 where in the second to last step we used the fact that E fxgx, y x = fxe gx, y x. This same identity is used a couple more times below. Finally the last term is m2 1 T W = EE Em 2 1 m2 1 Em 2 1 = EE Ax 1 Aµ 1 Ax 1 Aµ 1 T x 1 Ax 1 Aµ 1 Ax 1 Aµ 1 T = E = A = AΣ 1 A T E x 1 µ 1 x 1 µ 1 T A T So Cov = Σ AΣ 1 A T, The top-right cross-covariance submatrix can be computed as follows: E x 1 µ 1 Aµ 1 b T = EE x 1 µ 1 Aµ 1 b T x 1 = E x 1 µ 1 E Aµ 1 b T x 1 x 1 µ 1 Ax 1 + b Aµ 1 b T = E = E = Σ 1 A T. x 1 µ 1 x 1 µ 1 T A T Finally, the bottom left cross-covariance matrix is just the transpose of the top right. So far we have shown that the x = has the mean and covariance specified in the theorem statement. We now show that the joint density is indeed Gaussian:

8 8 4 Joint Distribution from Marginal + Conditional where px 1, = px 1 p x 1 = N x 1 µ 1, Σ 1 N Ax 1 + b, Σ 2 1 exp 1 2 x 1 µ 1 T Σ 1 1 x 1 µ 1 exp 1 2 Ax 1 b T Σ Ax 1 b = e qx/2, qx = x 1 µ 1 T Σ 1 1 x 1 µ 1 + Ax 1 b T Σ Ax 1 b. To apply Theorem 2, we need to make sure we can write the quadratic terms of qx as x T Mx, where M is symmetric positive definite. We ll separate the quadratic terms in qx and write l.o.t. for lower order terms, which includes linear terms of the form b T x and constants: qx = x 1 Σ 1 1 x 1 + Ax 1 T Σ Ax 1 + l.o.t. = x T 1 Σ A T Σ A x 1 2x T 1 A T Σ x T 2 Σ l.o.t. T Σ A T Σ = A A T Σ Σ A + l.o.t. Σ Let M be that matrix in the middle. We only need to show that M is positive definite. By definition, the symmetric matrix M is positive definite, iff for all x = 0, we have x T Mx > 0. Referring back to our first expression for qx in the equation block above, note that x T Mx = x 1 Σ 1 1 x }{{} 1 + Ax 1 T Σ Ax 1 }{{} α 1 α 2, where we ll refer to the first term on the RHS as α 1 and the second term as α 2. Suppose x 1 0. Then since Σ 1 1 is positive definite, α 1 > 0 and since Σ is positive definite, α 2 0 it could still be 0 if = Ax 1. Thus x T Mx > 0. Suppose x 1 = 0 and 0. Then α 1 = 0 and α 2 = x T 2 Σ > 0, by

9 9 positive definiteness. So again x T Mx > 0. This demonstrates that M is positive definite. Thus px e qx/2, where qx has the form required by Theorem 2. We conclude that x = is jointly Gaussian. We have also shown that the marginal means and covariances, as well as the cross-covariances all have the forms claimed.

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

We introduce methods that are useful in:

We introduce methods that are useful in: Instructor: Shengyu Zhang Content Derived Distributions Covariance and Correlation Conditional Expectation and Variance Revisited Transforms Sum of a Random Number of Independent Random Variables more

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 Multivariate distributions: Gaussian, Bernoulli, Probability tables Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro

More information

Determinants and Scalar Multiplication

Determinants and Scalar Multiplication Properties of Determinants In the last section, we saw how determinants interact with the elementary row operations. There are other operations on matrices, though, such as scalar multiplication, matrix

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

A quadratic expression is a mathematical expression that can be written in the form 2

A quadratic expression is a mathematical expression that can be written in the form 2 118 CHAPTER Algebra.6 FACTORING AND THE QUADRATIC EQUATION Textbook Reference Section 5. CLAST OBJECTIVES Factor a quadratic expression Find the roots of a quadratic equation A quadratic expression is

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

8 Eigenvectors and the Anisotropic Multivariate Gaussian Distribution

8 Eigenvectors and the Anisotropic Multivariate Gaussian Distribution Eigenvectors and the Anisotropic Multivariate Gaussian Distribution Eigenvectors and the Anisotropic Multivariate Gaussian Distribution EIGENVECTORS [I don t know if you were properly taught about eigenvectors

More information

Notes on Random Variables, Expectations, Probability Densities, and Martingales

Notes on Random Variables, Expectations, Probability Densities, and Martingales Eco 315.2 Spring 2006 C.Sims Notes on Random Variables, Expectations, Probability Densities, and Martingales Includes Exercise Due Tuesday, April 4. For many or most of you, parts of these notes will be

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

ECE 275B Homework # 1 Solutions Winter 2018

ECE 275B Homework # 1 Solutions Winter 2018 ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1

Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1 Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters Lecturer: Drew Bagnell Scribe:Greydon Foil 1 1 Gauss Markov Model Consider X 1, X 2,...X t, X t+1 to be

More information

ECE 275B Homework # 1 Solutions Version Winter 2015

ECE 275B Homework # 1 Solutions Version Winter 2015 ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2

More information

Power series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) a n z n. n=0

Power series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) a n z n. n=0 Lecture 22 Power series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) Recall a few facts about power series: a n z n This series in z is centered at z 0. Here z can

More information

Ch. 12 Linear Bayesian Estimators

Ch. 12 Linear Bayesian Estimators Ch. 1 Linear Bayesian Estimators 1 In chapter 11 we saw: the MMSE estimator takes a simple form when and are jointly Gaussian it is linear and used only the 1 st and nd order moments (means and covariances).

More information

Lecture 11. Multivariate Normal theory

Lecture 11. Multivariate Normal theory 10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances

More information

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion IEOR 471: Stochastic Models in Financial Engineering Summer 27, Professor Whitt SOLUTIONS to Homework Assignment 9: Brownian motion In Ross, read Sections 1.1-1.3 and 1.6. (The total required reading there

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Basic probability J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Logistics of next few lectures Final project released, proposals/groups due

More information

Linear Algebra for Beginners Open Doors to Great Careers. Richard Han

Linear Algebra for Beginners Open Doors to Great Careers. Richard Han Linear Algebra for Beginners Open Doors to Great Careers Richard Han Copyright 2018 Richard Han All rights reserved. CONTENTS PREFACE... 7 1 - INTRODUCTION... 8 2 SOLVING SYSTEMS OF LINEAR EQUATIONS...

More information

A Bayesian Treatment of Linear Gaussian Regression

A Bayesian Treatment of Linear Gaussian Regression A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,

More information

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University Introduction to the Mathematical and Statistical Foundations of Econometrics 1 Herman J. Bierens Pennsylvania State University November 13, 2003 Revised: March 15, 2004 2 Contents Preface Chapter 1: Probability

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes UC Berkeley Department of Electrical Engineering and Computer Sciences EECS 6: Probability and Random Processes Problem Set 3 Spring 9 Self-Graded Scores Due: February 8, 9 Submit your self-graded scores

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

The Multivariate Normal Distribution. In this case according to our theorem

The Multivariate Normal Distribution. In this case according to our theorem The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this

More information

But if z is conditioned on, we need to model it:

But if z is conditioned on, we need to model it: Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or

More information

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

Lecture 13: Simple Linear Regression in Matrix Format

Lecture 13: Simple Linear Regression in Matrix Format See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 13: Simple Linear Regression in Matrix Format 36-401, Section B, Fall 2015 13 October 2015 Contents 1 Least Squares in Matrix

More information

Bivariate distributions

Bivariate distributions Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient

More information

Gaussian Quiz. Preamble to The Humble Gaussian Distribution. David MacKay 1

Gaussian Quiz. Preamble to The Humble Gaussian Distribution. David MacKay 1 Preamble to The Humble Gaussian Distribution. David MacKay Gaussian Quiz H y y y 3. Assuming that the variables y, y, y 3 in this belief network have a joint Gaussian distribution, which of the following

More information

EXAM 2 REVIEW DAVID SEAL

EXAM 2 REVIEW DAVID SEAL EXAM 2 REVIEW DAVID SEAL 3. Linear Systems and Matrices 3.2. Matrices and Gaussian Elimination. At this point in the course, you all have had plenty of practice with Gaussian Elimination. Be able to row

More information

Introduction to Probability and Stocastic Processes - Part I

Introduction to Probability and Stocastic Processes - Part I Introduction to Probability and Stocastic Processes - Part I Lecture 2 Henrik Vie Christensen vie@control.auc.dk Department of Control Engineering Institute of Electronic Systems Aalborg University Denmark

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

Exam 2. Jeremy Morris. March 23, 2006

Exam 2. Jeremy Morris. March 23, 2006 Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following

More information

Week 7 Algebra 1 Assignment:

Week 7 Algebra 1 Assignment: Week 7 Algebra 1 Assignment: Day 1: Chapter 3 test Day 2: pp. 132-133 #1-41 odd Day 3: pp. 138-139 #2-20 even, 22-26 Day 4: pp. 141-142 #1-21 odd, 25-30 Day 5: pp. 145-147 #1-25 odd, 33-37 Notes on Assignment:

More information

22 : Hilbert Space Embeddings of Distributions

22 : Hilbert Space Embeddings of Distributions 10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation

More information

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Multivariate Distributions

Multivariate Distributions Copyright Cosma Rohilla Shalizi; do not distribute without permission updates at http://www.stat.cmu.edu/~cshalizi/adafaepov/ Appendix E Multivariate Distributions E.1 Review of Definitions Let s review

More information

(Multivariate) Gaussian (Normal) Probability Densities

(Multivariate) Gaussian (Normal) Probability Densities (Multivariate) Gaussian (Normal) Probability Densities Carl Edward Rasmussen, José Miguel Hernández-Lobato & Richard Turner April 20th, 2018 Rasmussen, Hernàndez-Lobato & Turner Gaussian Densities April

More information

3.7 Constrained Optimization and Lagrange Multipliers

3.7 Constrained Optimization and Lagrange Multipliers 3.7 Constrained Optimization and Lagrange Multipliers 71 3.7 Constrained Optimization and Lagrange Multipliers Overview: Constrained optimization problems can sometimes be solved using the methods of the

More information

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is

More information

Solving Linear Equations (in one variable)

Solving Linear Equations (in one variable) Solving Linear Equations (in one variable) In Chapter of my Elementary Algebra text you are introduced to solving linear equations. The main idea presented throughout Sections.1. is that you need to isolate

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

Stephen F Austin. Exponents and Logarithms. chapter 3

Stephen F Austin. Exponents and Logarithms. chapter 3 chapter 3 Starry Night was painted by Vincent Van Gogh in 1889. The brightness of a star as seen from Earth is measured using a logarithmic scale. Exponents and Logarithms This chapter focuses on understanding

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

Section 1.6 Inverse Functions

Section 1.6 Inverse Functions 0 Chapter 1 Section 1.6 Inverse Functions A fashion designer is travelling to Milan for a fashion show. He asks his assistant, Betty, what 7 degrees Fahrenheit is in Celsius, and after a quick search on

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

8 Square matrices continued: Determinants

8 Square matrices continued: Determinants 8 Square matrices continued: Determinants 8.1 Introduction Determinants give us important information about square matrices, and, as we ll soon see, are essential for the computation of eigenvalues. You

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Partial Fractions. June 27, In this section, we will learn to integrate another class of functions: the rational functions.

Partial Fractions. June 27, In this section, we will learn to integrate another class of functions: the rational functions. Partial Fractions June 7, 04 In this section, we will learn to integrate another class of functions: the rational functions. Definition. A rational function is a fraction of two polynomials. For example,

More information

Dot Products, Transposes, and Orthogonal Projections

Dot Products, Transposes, and Orthogonal Projections Dot Products, Transposes, and Orthogonal Projections David Jekel November 13, 2015 Properties of Dot Products Recall that the dot product or standard inner product on R n is given by x y = x 1 y 1 + +

More information

Bayesian Inference for the Multivariate Normal

Bayesian Inference for the Multivariate Normal Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Section 3.7: Solving Radical Equations

Section 3.7: Solving Radical Equations Objective: Solve equations with radicals and check for extraneous solutions. In this section, we solve equations that have roots in the problem. As you might expect, to clear a root we can raise both sides

More information

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z) CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices. 3d scatterplots You can also make 3d scatterplots, although these are less common than scatterplot matrices. > library(scatterplot3d) > y par(mfrow=c(2,2)) > scatterplot3d(y,highlight.3d=t,angle=20)

More information

The Multivariate Normal Distribution

The Multivariate Normal Distribution The Multivariate Normal Distribution Paul Johnson June, 3 Introduction A one dimensional Normal variable should be very familiar to students who have completed one course in statistics. The multivariate

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Basic Linear Algebra in MATLAB

Basic Linear Algebra in MATLAB Basic Linear Algebra in MATLAB 9.29 Optional Lecture 2 In the last optional lecture we learned the the basic type in MATLAB is a matrix of double precision floating point numbers. You learned a number

More information

5.2 Infinite Series Brian E. Veitch

5.2 Infinite Series Brian E. Veitch 5. Infinite Series Since many quantities show up that cannot be computed exactly, we need some way of representing it (or approximating it). One way is to sum an infinite series. Recall that a n is the

More information

Stat 206: Sampling theory, sample moments, mahalanobis

Stat 206: Sampling theory, sample moments, mahalanobis Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.] Math 43 Review Notes [Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty Dot Product If v (v, v, v 3 and w (w, w, w 3, then the

More information

1.1 Basic Algebra. 1.2 Equations and Inequalities. 1.3 Systems of Equations

1.1 Basic Algebra. 1.2 Equations and Inequalities. 1.3 Systems of Equations 1. Algebra 1.1 Basic Algebra 1.2 Equations and Inequalities 1.3 Systems of Equations 1.1 Basic Algebra 1.1.1 Algebraic Operations 1.1.2 Factoring and Expanding Polynomials 1.1.3 Introduction to Exponentials

More information

Handout 1 Systems of linear equations Gaussian elimination

Handout 1 Systems of linear equations Gaussian elimination 06-21254 Mathematical Techniques for Computer Science The University of Birmingham Autumn Semester 2017 School of Computer Science c Achim Jung September 27, 2017 Handout 1 Systems of linear equations

More information

Exponents and Logarithms

Exponents and Logarithms chapter 5 Starry Night, painted by Vincent Van Gogh in 1889. The brightness of a star as seen from Earth is measured using a logarithmic scale. Exponents and Logarithms This chapter focuses on understanding

More information

Numerical Methods Lecture 2 Simultaneous Equations

Numerical Methods Lecture 2 Simultaneous Equations Numerical Methods Lecture 2 Simultaneous Equations Topics: matrix operations solving systems of equations pages 58-62 are a repeat of matrix notes. New material begins on page 63. Matrix operations: Mathcad

More information

Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution. e z2 /2. f Z (z) = 1 2π. e z2 i /2

Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution. e z2 /2. f Z (z) = 1 2π. e z2 i /2 Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution Defn: Z R 1 N(0,1) iff f Z (z) = 1 2π e z2 /2 Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) (a column

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2: Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2: 03 17 08 3 All about lines 3.1 The Rectangular Coordinate System Know how to plot points in the rectangular coordinate system. Know the

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Linear Algebra and Robot Modeling

Linear Algebra and Robot Modeling Linear Algebra and Robot Modeling Nathan Ratliff Abstract Linear algebra is fundamental to robot modeling, control, and optimization. This document reviews some of the basic kinematic equations and uses

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Gaussian random variables inr n

Gaussian random variables inr n Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Generating Function Notes , Fall 2005, Prof. Peter Shor

Generating Function Notes , Fall 2005, Prof. Peter Shor Counting Change Generating Function Notes 80, Fall 00, Prof Peter Shor In this lecture, I m going to talk about generating functions We ve already seen an example of generating functions Recall when we

More information

4.5 Integration of Rational Functions by Partial Fractions

4.5 Integration of Rational Functions by Partial Fractions 4.5 Integration of Rational Functions by Partial Fractions From algebra, we learned how to find common denominators so we can do something like this, 2 x + 1 + 3 x 3 = 2(x 3) (x + 1)(x 3) + 3(x + 1) (x

More information