The Multivariate Gaussian Distribution [DRAFT]
|
|
- Meryl Eaton
- 5 years ago
- Views:
Transcription
1 The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs, because I don t know how to do it without either being very tedious and technical, or using some mathematics that are beyond the prerequisites. To my knowledge, there are two primary approaches to developing the theory of multivariate Gaussian distributions. The first, and by far the most common approach in machine learning textbooks, is to define the multivariate gaussian distribution in terms of its density function, and to derive results by manipulating these density functions. With this approach, a lot of the work turns out to be elaborate matrix algebra calculations happening inside the exponent of the Gaussian density. One issue with this approach is that the multivariate Gaussian density is only defined when the covariance matrix is invertible. To keep the derivations rigorous, some care must be taken to justify that the new covariance matrices we come up with are invertible. For my taste, I find the rigor in our textbooks to be a bit light on these points. We ve included the proof to Theorem 4 to give a flavor of the details one should add. The second major approach to multivariate Gaussian distributions does not use density functions at all and does not require invertible covariance matrices. This approach is much cleaner and more elegant, but it relies on the theory of characteristic functions and the Cramer-Wold device to get started, and these are beyond the prerequisites for this course. You can often find this development in more advanced probability and statistics books, such as Rao s excellent Linear Statistical Inference and Its Applications Chapter 8. 1
2 2 2 Recognizing a Gaussian Density 1 Multivariate Gaussian Density A random vector x R d has a d-dimensional multivariate Gaussian distribution with mean µ R d and covariance matrix Σ R d d if its density is given by N x µ, Σ = 1 exp 1 2π d/2 Σ 1/2 2 x µt Σ 1 x µ, where Σ denotes the determinant of Σ. Note that this expression requires that the covariance matrix Σ be invertible 1. Sometimes we will rewrite the factor in front of the exp as 2πΣ 1/2, which follows from basic facts about determinants. Exercise 1. There are at least 2 claims implicit in this definition. First, that the expression given is, in fact, a density i.e. it s non-negative and integrates to 1. Second, the density corresponds to a distribution with mean µ and covariance Σ, as claimed. 2 Recognizing a Gaussian Density If we come across a density function of the form px e qx/2, where qx is a positive definite quadratic function, then px is the density for a Gaussian distribution. More precisely, we have the following theorem: Theorem 2. Consider the quadratic function qx = x T Λx 2b T x + c, for any symmetric positive definite Λ R d d, any b R d, and c R. If px is a density function with px e qx/2, then px is a multivariate Gaussian density with mean Λ 1 b and covariance Λ 1. That is, px = Λ 1/2 exp 1 2π d/2 2 x Λ 1 b T Λx Λ 1 b. 1 We can have a d-dimensional Gaussian distribution with a non-invertible Σ, but such a distribution will not have a density on R d, and we will not address that case here.
3 3 Note: The inverse of the covariance matrix is called the precision matrix. Precision matrices of multivariate Gaussians have some interesting properties. explain that this is the Gaussian density in information form or canonical form c.f. Murphy p Proof. Completing the square, we have qx = x T Λx 2b T x + c = x Λ 1 b T Λx Λ 1 b b T Λ 1 b + c. Since the last two terms are independent of x, when we exponentiate qx, they can be absorbed into the constant of proportionality. That is, e qx/2 = exp 1 x Λ 1 b T Λx Λ 1 b exp 1 b T Λ 1 b + c 2 2 exp 1 x Λ 1 b T Λx Λ 1 b 2 Now recall that the density function for the multivariate Gaussian density N µ, Σ is 1 φx = exp 1 2π d/2 Σ 1/2 2 x µt Σ 1 x µ. Thus we see that px must also be a Gaussian density with covariance Σ = Λ 1 and mean Λ 1 b. 3 Conditional Distributions Bishop Section Let x R d have a Gaussian distribution: x N µ, Σ. Let s partition the random variables in x into two pieces: x =, where x 1 R d 1, R d 2 and d = d 1 + d 2. Similarly, we ll partition the mean vector, the covariance matrix, and the precision matrix as µ1 Σ11 Σ µ = Σ = 12 Λ = Σ 1 Λ11 Λ = 12, µ 2 Σ 21 Σ 22 Λ 21 Λ 22
4 4 3 Conditional Distributions Bishop Section where µ 1 R d 1, Σ 12 R d 1 d 2, Λ 12 R d 1 d 2, etc. Note that by the symmetry of the covariance matrix Σ, we have Σ 12 = Σ T 21. Note also that Σ 11 and Σ 22 are spd since Σ is trivial from definition of spd. When x = has a Gaussian distribution, we say that x 1 and are jointly Gaussian. Can we conclude anything about the marginal distributions of x 1 and? Indeed, the following theorem states that they are individually Gaussian: Theorem 3.. Let x, µ, and Σ be as defined above. Then the marginal distributions of x 1 and are each Gaussian, with x 1 N µ 1, Σ 1 N µ 2, Σ 2. Proof. See Bishop Section 2.3.2, p. 88 This can be done by showing that the marginal density px 1 = px 1, d has the form claimed, and similarly for. So when x 1 and are jointly Gaussian, we know that x 1 and are also marginally Gaussian. It turns out that the conditional distributions x 1 and x 1 are also Gaussian: Theorem 4. Let x, µ, and Σ be as defined above. Assume that Σ 22 is positive definite 2. Then the distribution of x 1 given is multivariate normal. More specifically, x 1 N µ 1 2, Σ 1 2, where µ 1 2 = µ 1 + Σ 12 Σ 1 22 µ 2 Σ 1 2 = Σ 11 Σ 12 Σ 1 22 Σ 21. Note that Σ 22 is positive definite, and thus invertible, since Σ is positive definite. Proof. See Bishop Section 2.3.1, p In fact, this is implied by our assumption that Σ is positive definite.
5 5 Example. Consider a standard regression framework in which we are building a predictive model for x 1 R given R d. Recall that if we are using a square loss, then the Bayes optimal prediction function is f = E x 1. If we assume that x 1 and are jointly Gaussian with a positive definite covariance matrix, then Theorem 4 tells us that E x 1 = µ 1 + Σ 12 Σ 1 22 µ 2. Of course, in practice we don t know µ and Σ. Nevertheless, what s interesting is that the Bayes optimal prediction function is an affine function of i.e. a linear function plus a constant. Thus if we think that our input vector and our response variable x 1 are jointly Gaussian, there s no reason to go beyond a hypothesis space of affine functions of. In other words, linear regression is all we need. 4 Joint Distribution from Marginal + Conditional In Section 3, we found that if x 1 and are jointly Gaussian, then is marginally Gaussian and the conditional distribution x 1 was also Gaussian, where the mean is a linear function of. The following theorem shows that we can we can go in the reverse direction as well. Theorem. Suppose x 1 N µ 1, Σ 1 and x 1 N Ax 1 + b, Σ 2 1, for some µ 1 R d 1, Σ 1 R d 1 d 1, A R d 2 d 1, and Σ 2 1 R d 2 d 2. Then x 1 and are jointly Gaussian with x = N µ1 Aµ 1 + b Σ1 Σ, 1 A T AΣ 1 Σ AΣ 1 A T. We ll prove this with two steps. First, we ll show that the mean and variance of x take the form claimed above. Then, we ll write down the joint density px 1, = px 1 p x 1 and show that it s proportional to e qx/2 for an appropriate quadratic qx. The result then follows from 2. Proof. We re given that Ex 1 = µ 1. For the other part of the mean vector, note that E = EE x 1 = E Ax 1 + b = Aµ 1 + b,
6 6 4 Joint Distribution from Marginal + Conditional which explains the lower entry in the mean. We are given that the marginal covariance of x 1 is Σ 1. That is, E x 1 µ 1 x 1 µ 1 T = Σ 1. We re also given the conditional covariance of : E Ax 1 b Ax 1 b T x 1 = Σ 2 1. We ll now try to express Cov in terms of these expressions above. For convenience, we ll introduce the random variable m 2 1 = Ax 1 +b. It s random because it depends on x 1. Note that Em 2 1 = E = Aµ 1 + b. So Cov = E E E T by definition = EE E E T x 1 law of iterated expectations = EE m m 2 1 E m m 2 1 E x 1 }{{}}{{} =0 =0 x2 T = EE m m2 1 E x2 m m2 1 E = U + 2V + W, where we ve multiplied out the parenthesized terms. The terms are as follows: x2 T U = EE m 2 1 x2 m 2 1 = Σ 2 1 The cross-term turns out to be zero: x2 T V = EE m 2 1 m2 1 E EE Ax 1 b Ax 1 + b Aµ 1 b T x 1 = E E Ax 1 + b x 1 Ax }{{} 1 + b Aµ 1 b T =0 = 0, T
7 7 where in the second to last step we used the fact that E fxgx, y x = fxe gx, y x. This same identity is used a couple more times below. Finally the last term is m2 1 T W = EE Em 2 1 m2 1 Em 2 1 = EE Ax 1 Aµ 1 Ax 1 Aµ 1 T x 1 Ax 1 Aµ 1 Ax 1 Aµ 1 T = E = A = AΣ 1 A T E x 1 µ 1 x 1 µ 1 T A T So Cov = Σ AΣ 1 A T, The top-right cross-covariance submatrix can be computed as follows: E x 1 µ 1 Aµ 1 b T = EE x 1 µ 1 Aµ 1 b T x 1 = E x 1 µ 1 E Aµ 1 b T x 1 x 1 µ 1 Ax 1 + b Aµ 1 b T = E = E = Σ 1 A T. x 1 µ 1 x 1 µ 1 T A T Finally, the bottom left cross-covariance matrix is just the transpose of the top right. So far we have shown that the x = has the mean and covariance specified in the theorem statement. We now show that the joint density is indeed Gaussian:
8 8 4 Joint Distribution from Marginal + Conditional where px 1, = px 1 p x 1 = N x 1 µ 1, Σ 1 N Ax 1 + b, Σ 2 1 exp 1 2 x 1 µ 1 T Σ 1 1 x 1 µ 1 exp 1 2 Ax 1 b T Σ Ax 1 b = e qx/2, qx = x 1 µ 1 T Σ 1 1 x 1 µ 1 + Ax 1 b T Σ Ax 1 b. To apply Theorem 2, we need to make sure we can write the quadratic terms of qx as x T Mx, where M is symmetric positive definite. We ll separate the quadratic terms in qx and write l.o.t. for lower order terms, which includes linear terms of the form b T x and constants: qx = x 1 Σ 1 1 x 1 + Ax 1 T Σ Ax 1 + l.o.t. = x T 1 Σ A T Σ A x 1 2x T 1 A T Σ x T 2 Σ l.o.t. T Σ A T Σ = A A T Σ Σ A + l.o.t. Σ Let M be that matrix in the middle. We only need to show that M is positive definite. By definition, the symmetric matrix M is positive definite, iff for all x = 0, we have x T Mx > 0. Referring back to our first expression for qx in the equation block above, note that x T Mx = x 1 Σ 1 1 x }{{} 1 + Ax 1 T Σ Ax 1 }{{} α 1 α 2, where we ll refer to the first term on the RHS as α 1 and the second term as α 2. Suppose x 1 0. Then since Σ 1 1 is positive definite, α 1 > 0 and since Σ is positive definite, α 2 0 it could still be 0 if = Ax 1. Thus x T Mx > 0. Suppose x 1 = 0 and 0. Then α 1 = 0 and α 2 = x T 2 Σ > 0, by
9 9 positive definiteness. So again x T Mx > 0. This demonstrates that M is positive definite. Thus px e qx/2, where qx has the form required by Theorem 2. We conclude that x = is jointly Gaussian. We have also shown that the marginal means and covariances, as well as the cross-covariances all have the forms claimed.
Bayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationThe Multivariate Gaussian Distribution
The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationA Probability Review
A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in
More informationWe introduce methods that are useful in:
Instructor: Shengyu Zhang Content Derived Distributions Covariance and Correlation Conditional Expectation and Variance Revisited Transforms Sum of a Random Number of Independent Random Variables more
More informationToday. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion
Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence
More informationCMPE 58K Bayesian Statistics and Machine Learning Lecture 5
CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 Multivariate distributions: Gaussian, Bernoulli, Probability tables Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey
More informationGaussian Mixture Models
Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro
More informationDeterminants and Scalar Multiplication
Properties of Determinants In the last section, we saw how determinants interact with the elementary row operations. There are other operations on matrices, though, such as scalar multiplication, matrix
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationA quadratic expression is a mathematical expression that can be written in the form 2
118 CHAPTER Algebra.6 FACTORING AND THE QUADRATIC EQUATION Textbook Reference Section 5. CLAST OBJECTIVES Factor a quadratic expression Find the roots of a quadratic equation A quadratic expression is
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More information8 Eigenvectors and the Anisotropic Multivariate Gaussian Distribution
Eigenvectors and the Anisotropic Multivariate Gaussian Distribution Eigenvectors and the Anisotropic Multivariate Gaussian Distribution EIGENVECTORS [I don t know if you were properly taught about eigenvectors
More informationNotes on Random Variables, Expectations, Probability Densities, and Martingales
Eco 315.2 Spring 2006 C.Sims Notes on Random Variables, Expectations, Probability Densities, and Martingales Includes Exercise Due Tuesday, April 4. For many or most of you, parts of these notes will be
More informationNext is material on matrix rank. Please see the handout
B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0
More informationECE 275B Homework # 1 Solutions Winter 2018
ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1
Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters Lecturer: Drew Bagnell Scribe:Greydon Foil 1 1 Gauss Markov Model Consider X 1, X 2,...X t, X t+1 to be
More informationECE 275B Homework # 1 Solutions Version Winter 2015
ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2
More informationPower series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) a n z n. n=0
Lecture 22 Power series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) Recall a few facts about power series: a n z n This series in z is centered at z 0. Here z can
More informationCh. 12 Linear Bayesian Estimators
Ch. 1 Linear Bayesian Estimators 1 In chapter 11 we saw: the MMSE estimator takes a simple form when and are jointly Gaussian it is linear and used only the 1 st and nd order moments (means and covariances).
More informationLecture 11. Multivariate Normal theory
10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances
More informationIEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion
IEOR 471: Stochastic Models in Financial Engineering Summer 27, Professor Whitt SOLUTIONS to Homework Assignment 9: Brownian motion In Ross, read Sections 1.1-1.3 and 1.6. (The total required reading there
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More information15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018
15-388/688 - Practical Data Science: Basic probability J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Logistics of next few lectures Final project released, proposals/groups due
More informationLinear Algebra for Beginners Open Doors to Great Careers. Richard Han
Linear Algebra for Beginners Open Doors to Great Careers Richard Han Copyright 2018 Richard Han All rights reserved. CONTENTS PREFACE... 7 1 - INTRODUCTION... 8 2 SOLVING SYSTEMS OF LINEAR EQUATIONS...
More informationA Bayesian Treatment of Linear Gaussian Regression
A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,
More informationIntroduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University
Introduction to the Mathematical and Statistical Foundations of Econometrics 1 Herman J. Bierens Pennsylvania State University November 13, 2003 Revised: March 15, 2004 2 Contents Preface Chapter 1: Probability
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationCSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes
CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail
More informationUC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes
UC Berkeley Department of Electrical Engineering and Computer Sciences EECS 6: Probability and Random Processes Problem Set 3 Spring 9 Self-Graded Scores Due: February 8, 9 Submit your self-graded scores
More informationGaussian Models (9/9/13)
STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate
More informationThe Multivariate Normal Distribution. In this case according to our theorem
The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this
More informationBut if z is conditioned on, we need to model it:
Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or
More informationLecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices
Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationReview of Probability Theory
Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationLarge Sample Properties of Estimators in the Classical Linear Regression Model
Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in
More informationLecture 13: Simple Linear Regression in Matrix Format
See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 13: Simple Linear Regression in Matrix Format 36-401, Section B, Fall 2015 13 October 2015 Contents 1 Least Squares in Matrix
More informationBivariate distributions
Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient
More informationGaussian Quiz. Preamble to The Humble Gaussian Distribution. David MacKay 1
Preamble to The Humble Gaussian Distribution. David MacKay Gaussian Quiz H y y y 3. Assuming that the variables y, y, y 3 in this belief network have a joint Gaussian distribution, which of the following
More informationEXAM 2 REVIEW DAVID SEAL
EXAM 2 REVIEW DAVID SEAL 3. Linear Systems and Matrices 3.2. Matrices and Gaussian Elimination. At this point in the course, you all have had plenty of practice with Gaussian Elimination. Be able to row
More informationIntroduction to Probability and Stocastic Processes - Part I
Introduction to Probability and Stocastic Processes - Part I Lecture 2 Henrik Vie Christensen vie@control.auc.dk Department of Control Engineering Institute of Electronic Systems Aalborg University Denmark
More information3. Probability and Statistics
FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important
More informationExam 2. Jeremy Morris. March 23, 2006
Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following
More informationWeek 7 Algebra 1 Assignment:
Week 7 Algebra 1 Assignment: Day 1: Chapter 3 test Day 2: pp. 132-133 #1-41 odd Day 3: pp. 138-139 #2-20 even, 22-26 Day 4: pp. 141-142 #1-21 odd, 25-30 Day 5: pp. 145-147 #1-25 odd, 33-37 Notes on Assignment:
More information22 : Hilbert Space Embeddings of Distributions
10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation
More informationA Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution
A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationMultivariate Distributions
Copyright Cosma Rohilla Shalizi; do not distribute without permission updates at http://www.stat.cmu.edu/~cshalizi/adafaepov/ Appendix E Multivariate Distributions E.1 Review of Definitions Let s review
More information(Multivariate) Gaussian (Normal) Probability Densities
(Multivariate) Gaussian (Normal) Probability Densities Carl Edward Rasmussen, José Miguel Hernández-Lobato & Richard Turner April 20th, 2018 Rasmussen, Hernàndez-Lobato & Turner Gaussian Densities April
More information3.7 Constrained Optimization and Lagrange Multipliers
3.7 Constrained Optimization and Lagrange Multipliers 71 3.7 Constrained Optimization and Lagrange Multipliers Overview: Constrained optimization problems can sometimes be solved using the methods of the
More informationBayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop
Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate
More information18.440: Lecture 28 Lectures Review
18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is
More informationSolving Linear Equations (in one variable)
Solving Linear Equations (in one variable) In Chapter of my Elementary Algebra text you are introduced to solving linear equations. The main idea presented throughout Sections.1. is that you need to isolate
More informationA Few Notes on Fisher Information (WIP)
A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties
More informationStephen F Austin. Exponents and Logarithms. chapter 3
chapter 3 Starry Night was painted by Vincent Van Gogh in 1889. The brightness of a star as seen from Earth is measured using a logarithmic scale. Exponents and Logarithms This chapter focuses on understanding
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationLecture Note 1: Probability Theory and Statistics
Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would
More informationSection 1.6 Inverse Functions
0 Chapter 1 Section 1.6 Inverse Functions A fashion designer is travelling to Milan for a fashion show. He asks his assistant, Betty, what 7 degrees Fahrenheit is in Celsius, and after a quick search on
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More information8 Square matrices continued: Determinants
8 Square matrices continued: Determinants 8.1 Introduction Determinants give us important information about square matrices, and, as we ll soon see, are essential for the computation of eigenvalues. You
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationPartial Fractions. June 27, In this section, we will learn to integrate another class of functions: the rational functions.
Partial Fractions June 7, 04 In this section, we will learn to integrate another class of functions: the rational functions. Definition. A rational function is a fraction of two polynomials. For example,
More informationDot Products, Transposes, and Orthogonal Projections
Dot Products, Transposes, and Orthogonal Projections David Jekel November 13, 2015 Properties of Dot Products Recall that the dot product or standard inner product on R n is given by x y = x 1 y 1 + +
More informationBayesian Inference for the Multivariate Normal
Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationSection 3.7: Solving Radical Equations
Objective: Solve equations with radicals and check for extraneous solutions. In this section, we solve equations that have roots in the problem. As you might expect, to clear a root we can raise both sides
More informationVariables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)
CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More information3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.
3d scatterplots You can also make 3d scatterplots, although these are less common than scatterplot matrices. > library(scatterplot3d) > y par(mfrow=c(2,2)) > scatterplot3d(y,highlight.3d=t,angle=20)
More informationThe Multivariate Normal Distribution
The Multivariate Normal Distribution Paul Johnson June, 3 Introduction A one dimensional Normal variable should be very familiar to students who have completed one course in statistics. The multivariate
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationBasic Linear Algebra in MATLAB
Basic Linear Algebra in MATLAB 9.29 Optional Lecture 2 In the last optional lecture we learned the the basic type in MATLAB is a matrix of double precision floating point numbers. You learned a number
More information5.2 Infinite Series Brian E. Veitch
5. Infinite Series Since many quantities show up that cannot be computed exactly, we need some way of representing it (or approximating it). One way is to sum an infinite series. Recall that a n is the
More informationStat 206: Sampling theory, sample moments, mahalanobis
Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because
More informationContinuous Random Variables
1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables
More information[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]
Math 43 Review Notes [Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty Dot Product If v (v, v, v 3 and w (w, w, w 3, then the
More information1.1 Basic Algebra. 1.2 Equations and Inequalities. 1.3 Systems of Equations
1. Algebra 1.1 Basic Algebra 1.2 Equations and Inequalities 1.3 Systems of Equations 1.1 Basic Algebra 1.1.1 Algebraic Operations 1.1.2 Factoring and Expanding Polynomials 1.1.3 Introduction to Exponentials
More informationHandout 1 Systems of linear equations Gaussian elimination
06-21254 Mathematical Techniques for Computer Science The University of Birmingham Autumn Semester 2017 School of Computer Science c Achim Jung September 27, 2017 Handout 1 Systems of linear equations
More informationExponents and Logarithms
chapter 5 Starry Night, painted by Vincent Van Gogh in 1889. The brightness of a star as seen from Earth is measured using a logarithmic scale. Exponents and Logarithms This chapter focuses on understanding
More informationNumerical Methods Lecture 2 Simultaneous Equations
Numerical Methods Lecture 2 Simultaneous Equations Topics: matrix operations solving systems of equations pages 58-62 are a repeat of matrix notes. New material begins on page 63. Matrix operations: Mathcad
More informationNext tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution. e z2 /2. f Z (z) = 1 2π. e z2 i /2
Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution Defn: Z R 1 N(0,1) iff f Z (z) = 1 2π e z2 /2 Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) (a column
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationProbability Background
Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second
More informationMath101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:
Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2: 03 17 08 3 All about lines 3.1 The Rectangular Coordinate System Know how to plot points in the rectangular coordinate system. Know the
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationLinear Algebra and Robot Modeling
Linear Algebra and Robot Modeling Nathan Ratliff Abstract Linear algebra is fundamental to robot modeling, control, and optimization. This document reviews some of the basic kinematic equations and uses
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More informationGaussian random variables inr n
Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationGenerating Function Notes , Fall 2005, Prof. Peter Shor
Counting Change Generating Function Notes 80, Fall 00, Prof Peter Shor In this lecture, I m going to talk about generating functions We ve already seen an example of generating functions Recall when we
More information4.5 Integration of Rational Functions by Partial Fractions
4.5 Integration of Rational Functions by Partial Fractions From algebra, we learned how to find common denominators so we can do something like this, 2 x + 1 + 3 x 3 = 2(x 3) (x + 1)(x 3) + 3(x + 1) (x
More information