CMPE 58K Bayesian Statistics and Machine Learning Lecture 5
|
|
- Patrick Berry
- 5 years ago
- Views:
Transcription
1 CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 Multivariate distributions: Gaussian, Bernoulli, Probability tables Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey Instructor: A. Taylan Cemgil Fall 2009 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul
2 Multivariate Bernoulli A probability table with binary states px 1, x 2 x 2 0 x 2 1 x 1 0 p 00 p 01 x 1 1 p 10 p 11 px 1, x 2 p 1 x 11 x 2 00 p x 11 x 2 10 p 1 x 1x 2 01 p x 1x 2 11 exp 1 x 1 x 2 + x 1 x 2 log p 00 + x 1 x 1 x 2 log p 10 +x 2 x 1 x 2 log p 01 + x 1 x 2 log p 11 exp log p 00 + x 1 log p 10 + x 2 log p 01 + x 1 x 2 log p 00p 11 p 00 p 00 p 10 p 01 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 1
3 Multivariate Bernoulli Independent case px 1, x 2 px 1 px 2 px 1, x 2 x 2 0 x 2 1 x 1 0 p 0 q 0 p 0 q 1 x 1 1 p 1 q 0 p 1 q 1 px 1, x 2 exp exp log p 0 q 0 + x 1 log p 1q 0 + x 2 log p 0q 1 + x 1 x 2 log p 0q 0 p 1 q 1 p 0 q 0 p 0 q 0 p 1 q 0 p 0 q 1 log p 0 q 0 + x 1 log p 1 p 0 + x 2 log q 1 q 0 + x 1 x 2 log 1 The coupling parameter is zero Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 2
4 Multivariate Bernoulli Canonical parameters px 1, x 2 exp g + h 1 x 1 + h 2 x 2 + θ 1,2 x 1 x 2 h 1 log p 10 p 00 h 2 log p 01 p 00 θ 1,2 log p 00p 11 p 10 p 01 h i : threshold θ i,j : pairwise coupling p are the moment parameters Coupling parameters are harder to interpret but provide direct information about the conditional independence structure Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 3
5 Example x i {0, 1} px 1, x 2 exp 2x 1 0.5x 2 3x 1 x 2 qx 1, x 2 exp 2x 1 0.5x x 1 x 2 Are x 1 and x 2 independent, given p or given q? What is px 1 qx 1? Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 4
6 For two events A and B Odds and the Cross Product Ratio cpt oddsa pa pa c oddsa B pa B pa c B cpra, B pa BpAc B c pa c BpA B c oddsa B/oddsA B c When the events A and B are independent, we have pa B pa cpra, B oddsa/oddsa 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 5
7 Odds and the Cross Product Ratio cpt Univariate Bernoulli distribution is parametrised by the event A : x 1 1 px 1 exp log p 0 + x 1 log p 1 p 0 exp x 1 log oddsx 1 1 Bivariate Bernoulli distribution px 1, x 2 exp log p 00 + x 1 log p 10 + x 2 log p 01 + x 1 x 2 log p 00p 11 p 00 p 00 p 10 p 01 exp x 1 log oddsx 1 1 x x 2 log oddsx 2 1 x 1 0 +x 1 x 2 log cprx 1 1, x 2 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 6
8 Multivariate Bernoulli For trivariate case, we have pairwise and triple interactions px 1, x 2, x 3 p 1 x 11 x 2 1 x p x 11 x 2 1 x p 1 x 1x 2 1 x p x 1x 2 1 x p 1 x 11 x 2 x p x 11 x 2 x p 1 x 1x 2 x p x 1x 2 x log px log p log p 100 p 000 x 1 + log p 010 p 000 x 2 + log p 001 p 000 x 3 + log p 000p 110 p 010 p 100 x 1 x 2 + log p 000p 101 p 001 p 100 x 1 x 3 + log p 000p 011 p 001 p 010 x 2 x 3 log p 001p 010 p 100 p 111 p 000 p 011 p 101 p 110 x 1 x 2 x 3 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 7
9 log px log p log oddsx 1 1 x 2 0, x 3 0x 1 + log oddsx 2 1 x 1 0, x 3 0x 2 + log oddsx 3 1 x 1 0, x 2 0x 3 + log cprx 1 1, x 2 1 x 3 0x 1 x 2 + log cprx 1 1, x 3 1 x 2 0x 1 x 3 + log cprx 2 1, x 3 1 x 1 0x 2 x 3 + log cprx 1 1, x 2 1 x 3 0 cprx 1 1, x 2 1 x 3 1 x 1x 2 x 3 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 8
10 Canonical Parameters versus Moment Parameters Canonical parameters are harder to interpret but give information about the independence structure Inference in exponential family models turns out to be just conversion from a canonical representation to a moment representation. Sounds simple but can be easily intractable Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 9
11 The Multivariate Gaussian Distribution. N s; µ, P µ is the mean and P is the covariance: N s; µ, P 2πP 1/2 exp 1 2 s µ P 1 s µ exp 12 s P 1 s + µ P 1 s 12 µ P 1 µ 12 2πP log N s; µ, P 1 2 s P 1 s + µ P 1 s + const 1 2 Tr P 1 ss + µ P 1 s + const Tr P 1 ss + µ P 1 s Notation: log fx + gx fx expgx c R : fx c expgx Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 10
12 Gaussian potentials Consider a Gaussian potential with mean µ and covariance Σ on x. φx αn µ, Σ 1 α 2πΣ 1 2 exp 1 2 x µt Σ 1 x µ 2 where dxφx α and 2πΣ is a short notation for 2π d det Σ, where Σ is d d.. If α 1 the potential is normalized. A general Gaussian potential φ need not to be normalized so α is in fact an arbitrary positive constant. The exponent is just a quadratic form. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 11
13 Canonical Form φx exp{log α 1 2 log 2πΣ 1 2 µt Σ 1 µ} + µ T Σ 1 x 1 2 xt Σ 1 x expg + h T x 1 2 xt Kx Alternative to the conventional and intuitive moment form. Here we represent the potential by the polynomial coefficients h and K. Coefficients h and K as natural parameters. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 12
14 Canonical and Moment parametrisations The moment parameters and canonical parameters are related by K Σ 1 h Σ 1 µ g log α 1 2 log 2πΣ 1 2 µt Σ 1 ΣΣ 1 µ log α log K 2π 1 2 ht K 1 h Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 13
15 Jointly Gaussian Vectors Moment form µ1 Σ11 Σ φx 1, x 2 αn, 12 µ 2 Σ 21 Σ 22 φ α 2πΣ 1 2 exp Σ x1 µ 1 x 2 µ 11 Σ 12 x1 µ 1 2 Σ 21 Σ 22 x 2 µ 2 Canonical form φx 1, x 2 expg + h 1 h 2 x 1 x K x1 x 11 K 12 x1 2 K 21 K 22 x 2 need to find a parametric representation of K Σ 1 in terms of the partitions Σ 11, Σ 12, Σ 21, Σ 22. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 14
16 Partitioned Matrix Inverse Strategy: We will find two matrices X and Z such that W becomes block diagonal. LΣR W Σ L 1 W R 1 Σ 1 RW 1 L K Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 15
17 Add a multiple of row s to row t Gauss Transformations Premultiply Σ with Ls, t where L i,j s, t 1, i j γ, i s and j t 0, o/w Example: s 2, t 1 1 γ 0 1 a b c d a + γc b + γd c d The inverse just subtracts what is added 1 γ γ 0 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 16
18 Gauss Transformations Given Σ, add a multiple of column s to column t Postmultiply Σ with Rs, t where R i,j s, t 1, i j γ, j s and i t 0, o/w Example: s 2, t 1 a b c d 1 0 γ 1 a + γb b c + γd d Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 17
19 Scalar example Σ LΣ LΣR a b c d 1 bd 1 a b a bd 1 c b bd 1 d 0 1 c d c d 1 bd 1 a b c d d 1 c 1 a bd 1 c c d d 1 c 1 a bd 1 c 0 a bd c dd 1 1 c 0 W c d 0 d Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 18
20 Scalar example cont Σ L 1 W R 1 Σ 1 RW 1 L 1 0 d 1 c 1 a bd 1 c bd 1 0 d a bd 1 c bd 1 d 1 ca bd 1 c 1 d a bd 1 c 1 a bd 1 c 1 bd 1 d 1 ca bd 1 c 1 d 1 + d 1 ca bd 1 c 1 bd 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 19
21 Scalar example We could also use Σ LΣ LΣR RW 1 L a b c d 1 0 a b ca 1 1 c d 1 0 a b 1 a 1 b ca 1 1 c d 0 1 a 0 0 d ca 1 W b a 1 + a 1 b d ca 1 b 1 ca 1 a 1 b d ca 1 b 1 d ca 1 b 1 ca 1 d ca 1 b 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 20
22 Partitioned Matrix Inverse In matrix case, this leads to following dual factorizations of Σ as Σ I Σ12 Σ I I 0 Σ 21 Σ 1 11 I Σ11 Σ 12 Σ 22 1 Σ 21 0 I 0 0 Σ 22 Σ 1 22 Σ 21 I Σ11 0 I Σ 1 0 Σ 22 Σ 21 Σ Σ 12 Σ 12 0 I Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 21
23 We will introduce the notation The Schur Complement Σ/Σ 22 Σ 11 Σ 12 Σ 22 1 Σ 21 Σ/Σ 11 Σ 22 Σ 21 Σ 11 1 Σ 12 Determinant Σ Σ/Σ 11 Σ 11 Σ/Σ 22 Σ 22 Σ 1 I 0 Σ 1 22 Σ 21 I I Σ 1 11 Σ 12 0 I Σ/Σ Σ 1 22 Σ Σ/Σ 11 1 I Σ12 Σ I I 0 Σ 21 Σ 1 11 I Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 22
24 Partitioned Matrix Inverse K11 K 12 K 21 K 22 1 Σ11 Σ 12 Σ 21 Σ 22 Σ/Σ 22 1 Σ/Σ 22 1 Σ 12 Σ 1 22 Σ 1 22 Σ 21Σ/Σ 22 1 Σ Σ 1 22 Σ 21Σ/Σ 22 1 Σ 12 Σ 1 22 Σ Σ 1 11 Σ 12Σ/Σ 11 1 Σ 21 Σ 1 11 Σ 1 11 Σ 12Σ/Σ 11 1 Σ/Σ 11 1 Σ 21 Σ 1 11 Σ/Σ 11 1 Quite complicated looking formulas, but straightforward to implement Caution: Σ 1 11 K 11 in general! Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 23
25 Matrix Inversion Lemma Read the diagonal entries Σ 11 Σ 12 Σ 22 1 Σ 21 1 Σ Σ 1 11 Σ 12Σ/Σ 11 1 Σ 21 Σ 1 11 A BC 1 D 1 A 1 + A 1 B C DA 1 B 1 DA 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 24
26 Factorisation of Multivariate Gaussians Consider the joint distribution over the variable x x1 x 2 where the joint distribution is Gaussian px N x; µ, Σ with µ Σ µ1 µ 2 Σ1 Σ 12 Σ 12 Σ 2 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 25
27 Find the following Factorisation of Multivariate Gaussians 1. Conditionals a px 1 x 2 b px 2 x 1 2. Marginals a px 1 b px 2 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 26
28 Factorisation of Multivariate Gaussians Using the partitioned inverse equations, we rearrange px 1, x 2 exp 1 2 x1 µ 1 x 2 µ 2 Σ1 Σ 12 Σ 12 Σ 2 1 x1 µ 1 x 2 µ 2 bring the expression in form of px 1 px 2 x 1 or px 2 px 1 x 2 where the marginal and conditional can be easily identified. See also Bishop, section 2.3. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 27
29 Factorisation of Multivariate Gaussians We have the two decompositions Σ1 Σ 12 Σ 12 Σ 2 1 Σ1 Σ 12 Σ 1 Σ 1 2 Σ 12 Σ Σ 1 1 Σ 12 2 Σ 12 Σ1 Σ 12 Σ 1 2 Σ 12 Σ2 Σ 12Σ 1 Σ 2 Σ 12Σ 1 1 Σ 12 1 Σ 1 Σ 12 Σ 1 1 Σ Σ 1 2 Σ 12 1 Σ 12 Σ 1 1 Σ 12 1 Σ 12 Σ Σ Σ 12 Σ1 Σ 12 Σ 1 2 Σ 1 Σ 12 Σ2 Σ2 Σ 12Σ We let s i x i µ i and use the first decomposition. ps 1, s 2 exp 1 2 s1 s 2 Σ1 Σ 12 Σ 12 Σ 2 1 s1 s 2 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 28
30 exp 1 2 exp 1 2 s 1 s 2 Σ 1 2 Σ 12 s1 1 2 s 2 Σ 1 2 Σ s 2 Σ 1 2 s 2 s 2 Σ1 Σ 12 Σ 1 Σ 1 2 Σ 12 1 Σ 1 Σ 12 Σ Σ s1 1 Σ 1 Σ 12 Σ Σ s1 Σ 1 Σ 12 Σ 1 2 Σ 12 2 Σ 12 Σ1 Σ 12 Σ 1 2 Σ 12 1 Σ12 Σ 1 2 s 2 N s 1 ; Σ 12 Σ 1 2 s 2, Σ 1 Σ 12 Σ 1 2 Σ 12 N s 2; 0, Σ 2 1 Σ 1 Σ 12 Σ 1 1 Σ Σ 1 2 Σ 12 N x 1 ; µ 1 + Σ 12 Σ 1 2 x 2 µ 2, Σ 1 Σ 12 Σ 1 2 Σ 12 N x 2; µ 2, Σ Σ Σ12 Σ 1 Σ1 Σ 12 Σ 1 2 Σ 12 This leads to a factorisation of form px 2 px 1 x 2. The second decomposition will lead to the other factorisation px 1 px 2 x 1. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 29
31 Keywords Summary Multivariate Bernoulli Multivariate Gaussian Partitioned Inverse Equations, Matrix Inversion Lemma Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 30
32 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 31
The Multivariate Gaussian Distribution [DRAFT]
The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationGaussian Models (9/9/13)
STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationThe generative approach to classification. A classification problem. Generative models CSE 250B
The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationEEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as
L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationMultivariate Gaussians. Sargur Srihari
Multivariate Gaussians Sargur srihari@cedar.buffalo.edu 1 Topics 1. Multivariate Gaussian: Basic Parameterization 2. Covariance and Information Form 3. Operations on Gaussians 4. Independencies in Gaussians
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationProbability Distributions
Probability Distributions Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr 1 / 25 Outline Summarize the main properties of some of
More informationMultiple Random Variables
Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An
More informationPROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:
More informationA Probability Review
A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationMachine Learning - MT Classification: Generative Models
Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of Oxford October 31, 2016 Announcements Practical 1 Submission Try to get signed off during session itself Otherwise,
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationTAMS39 Lecture 2 Multivariate normal distribution
TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution
More informationEcon Slides from Lecture 8
Econ 205 Sobel Econ 205 - Slides from Lecture 8 Joel Sobel September 1, 2010 Computational Facts 1. det AB = det BA = det A det B 2. If D is a diagonal matrix, then det D is equal to the product of its
More informationA Process over all Stationary Covariance Kernels
A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that
More informationLecture 1 October 9, 2013
Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/
More information3. Probability and Statistics
FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More information(Multivariate) Gaussian (Normal) Probability Densities
(Multivariate) Gaussian (Normal) Probability Densities Carl Edward Rasmussen, José Miguel Hernández-Lobato & Richard Turner April 20th, 2018 Rasmussen, Hernàndez-Lobato & Turner Gaussian Densities April
More informationChapter 1: Systems of Linear Equations and Matrices
: Systems of Linear Equations and Matrices Multiple Choice Questions. Which of the following equations is linear? (A) x + 3x 3 + 4x 4 3 = 5 (B) 3x x + x 3 = 5 (C) 5x + 5 x x 3 = x + cos (x ) + 4x 3 = 7.
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More information1.1 Basic Algebra. 1.2 Equations and Inequalities. 1.3 Systems of Equations
1. Algebra 1.1 Basic Algebra 1.2 Equations and Inequalities 1.3 Systems of Equations 1.1 Basic Algebra 1.1.1 Algebraic Operations 1.1.2 Factoring and Expanding Polynomials 1.1.3 Introduction to Exponentials
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationExam 2. Jeremy Morris. March 23, 2006
Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationcomponent risk analysis
273: Urban Systems Modeling Lec. 3 component risk analysis instructor: Matteo Pozzi 273: Urban Systems Modeling Lec. 3 component reliability outline risk analysis for components uncertain demand and uncertain
More informationLet X and Y denote two random variables. The joint distribution of these random
EE385 Class Notes 9/7/0 John Stensby Chapter 3: Multiple Random Variables Let X and Y denote two random variables. The joint distribution of these random variables is defined as F XY(x,y) = [X x,y y] P.
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationProperties and approximations of some matrix variate probability density functions
Technical report from Automatic Control at Linköpings universitet Properties and approximations of some matrix variate probability density functions Karl Granström, Umut Orguner Division of Automatic Control
More information1 Bayesian Linear Regression (BLR)
Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,
More informationMaximum likelihood estimation
Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Recitation 3 1 Gaussian Graphical Models: Schur s Complement Consider
More information1 Data Arrays and Decompositions
1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationFormulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More information5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors
EE401 (Semester 1) 5. Random Vectors Jitkomut Songsiri probabilities characteristic function cross correlation, cross covariance Gaussian random vectors functions of random vectors 5-1 Random vectors we
More informationThe Multivariate Gaussian Distribution
The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models
1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,
More informationBayesian Inference for the Multivariate Normal
Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate
More informationCS340 Machine learning Gaussian classifiers
CS340 Machine learning Gaussian classifiers 1 Correlated features Height and weight are not independent 2 Multivariate Gaussian Multivariate Normal (MVN) N(x µ,σ) def 1 (2π) p/2 Σ 1/2exp[ 1 2 (x µ)t Σ
More informationExpectation Propagation in Factor Graphs: A Tutorial
DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV
More information2.3. The Gaussian Distribution
78 2. PROBABILITY DISTRIBUTIONS Figure 2.5 Plots of the Dirichlet distribution over three variables, where the two horizontal axes are coordinates in the plane of the simplex and the vertical axis corresponds
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationCOMPUTATIONAL MULTI-POINT BME AND BME CONFIDENCE SETS
VII. COMPUTATIONAL MULTI-POINT BME AND BME CONFIDENCE SETS Until now we have considered the estimation of the value of a S/TRF at one point at the time, independently of the estimated values at other estimation
More informationReview (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology
Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna
More informationCS 559: Machine Learning Fundamentals and Applications 2 nd Set of Notes
1 CS 559: Machine Learning Fundamentals and Applications 2 nd Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Overview
More informationDynamic System Identification using HDMR-Bayesian Technique
Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in
More informationChapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations
Chapter 1: Systems of linear equations and matrices Section 1.1: Introduction to systems of linear equations Definition: A linear equation in n variables can be expressed in the form a 1 x 1 + a 2 x 2
More informationMultivariate Gaussian Analysis
BS2 Statistical Inference, Lecture 7, Hilary Term 2009 February 13, 2009 Marginal and conditional distributions For a positive definite covariance matrix Σ, the multivariate Gaussian distribution has density
More informationContinuous Random Variables
1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables
More informationCS Lecture 13. More Maximum Likelihood
CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood
More informationAlternative Parameterizations of Markov Networks. Sargur Srihari
Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models Features (Ising,
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More information7.5 Operations with Matrices. Copyright Cengage Learning. All rights reserved.
7.5 Operations with Matrices Copyright Cengage Learning. All rights reserved. What You Should Learn Decide whether two matrices are equal. Add and subtract matrices and multiply matrices by scalars. Multiply
More information{ p if x = 1 1 p if x = 0
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationCMPE 547 Bayesian Statistics and Machine Learning
CMPE 547 Bayesian Statistics and Machine Learning A. Taylan Cemgil Dept. of Computer Engineering Boğaziçi University Cemgil Bayesian Machine Learning. Spring 2017, Istanbul A simple problem Die 1: λ {,,,,,
More information