CMPE 58K Bayesian Statistics and Machine Learning Lecture 5

Size: px
Start display at page:

Download "CMPE 58K Bayesian Statistics and Machine Learning Lecture 5"

Transcription

1 CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 Multivariate distributions: Gaussian, Bernoulli, Probability tables Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey Instructor: A. Taylan Cemgil Fall 2009 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul

2 Multivariate Bernoulli A probability table with binary states px 1, x 2 x 2 0 x 2 1 x 1 0 p 00 p 01 x 1 1 p 10 p 11 px 1, x 2 p 1 x 11 x 2 00 p x 11 x 2 10 p 1 x 1x 2 01 p x 1x 2 11 exp 1 x 1 x 2 + x 1 x 2 log p 00 + x 1 x 1 x 2 log p 10 +x 2 x 1 x 2 log p 01 + x 1 x 2 log p 11 exp log p 00 + x 1 log p 10 + x 2 log p 01 + x 1 x 2 log p 00p 11 p 00 p 00 p 10 p 01 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 1

3 Multivariate Bernoulli Independent case px 1, x 2 px 1 px 2 px 1, x 2 x 2 0 x 2 1 x 1 0 p 0 q 0 p 0 q 1 x 1 1 p 1 q 0 p 1 q 1 px 1, x 2 exp exp log p 0 q 0 + x 1 log p 1q 0 + x 2 log p 0q 1 + x 1 x 2 log p 0q 0 p 1 q 1 p 0 q 0 p 0 q 0 p 1 q 0 p 0 q 1 log p 0 q 0 + x 1 log p 1 p 0 + x 2 log q 1 q 0 + x 1 x 2 log 1 The coupling parameter is zero Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 2

4 Multivariate Bernoulli Canonical parameters px 1, x 2 exp g + h 1 x 1 + h 2 x 2 + θ 1,2 x 1 x 2 h 1 log p 10 p 00 h 2 log p 01 p 00 θ 1,2 log p 00p 11 p 10 p 01 h i : threshold θ i,j : pairwise coupling p are the moment parameters Coupling parameters are harder to interpret but provide direct information about the conditional independence structure Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 3

5 Example x i {0, 1} px 1, x 2 exp 2x 1 0.5x 2 3x 1 x 2 qx 1, x 2 exp 2x 1 0.5x x 1 x 2 Are x 1 and x 2 independent, given p or given q? What is px 1 qx 1? Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 4

6 For two events A and B Odds and the Cross Product Ratio cpt oddsa pa pa c oddsa B pa B pa c B cpra, B pa BpAc B c pa c BpA B c oddsa B/oddsA B c When the events A and B are independent, we have pa B pa cpra, B oddsa/oddsa 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 5

7 Odds and the Cross Product Ratio cpt Univariate Bernoulli distribution is parametrised by the event A : x 1 1 px 1 exp log p 0 + x 1 log p 1 p 0 exp x 1 log oddsx 1 1 Bivariate Bernoulli distribution px 1, x 2 exp log p 00 + x 1 log p 10 + x 2 log p 01 + x 1 x 2 log p 00p 11 p 00 p 00 p 10 p 01 exp x 1 log oddsx 1 1 x x 2 log oddsx 2 1 x 1 0 +x 1 x 2 log cprx 1 1, x 2 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 6

8 Multivariate Bernoulli For trivariate case, we have pairwise and triple interactions px 1, x 2, x 3 p 1 x 11 x 2 1 x p x 11 x 2 1 x p 1 x 1x 2 1 x p x 1x 2 1 x p 1 x 11 x 2 x p x 11 x 2 x p 1 x 1x 2 x p x 1x 2 x log px log p log p 100 p 000 x 1 + log p 010 p 000 x 2 + log p 001 p 000 x 3 + log p 000p 110 p 010 p 100 x 1 x 2 + log p 000p 101 p 001 p 100 x 1 x 3 + log p 000p 011 p 001 p 010 x 2 x 3 log p 001p 010 p 100 p 111 p 000 p 011 p 101 p 110 x 1 x 2 x 3 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 7

9 log px log p log oddsx 1 1 x 2 0, x 3 0x 1 + log oddsx 2 1 x 1 0, x 3 0x 2 + log oddsx 3 1 x 1 0, x 2 0x 3 + log cprx 1 1, x 2 1 x 3 0x 1 x 2 + log cprx 1 1, x 3 1 x 2 0x 1 x 3 + log cprx 2 1, x 3 1 x 1 0x 2 x 3 + log cprx 1 1, x 2 1 x 3 0 cprx 1 1, x 2 1 x 3 1 x 1x 2 x 3 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 8

10 Canonical Parameters versus Moment Parameters Canonical parameters are harder to interpret but give information about the independence structure Inference in exponential family models turns out to be just conversion from a canonical representation to a moment representation. Sounds simple but can be easily intractable Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 9

11 The Multivariate Gaussian Distribution. N s; µ, P µ is the mean and P is the covariance: N s; µ, P 2πP 1/2 exp 1 2 s µ P 1 s µ exp 12 s P 1 s + µ P 1 s 12 µ P 1 µ 12 2πP log N s; µ, P 1 2 s P 1 s + µ P 1 s + const 1 2 Tr P 1 ss + µ P 1 s + const Tr P 1 ss + µ P 1 s Notation: log fx + gx fx expgx c R : fx c expgx Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 10

12 Gaussian potentials Consider a Gaussian potential with mean µ and covariance Σ on x. φx αn µ, Σ 1 α 2πΣ 1 2 exp 1 2 x µt Σ 1 x µ 2 where dxφx α and 2πΣ is a short notation for 2π d det Σ, where Σ is d d.. If α 1 the potential is normalized. A general Gaussian potential φ need not to be normalized so α is in fact an arbitrary positive constant. The exponent is just a quadratic form. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 11

13 Canonical Form φx exp{log α 1 2 log 2πΣ 1 2 µt Σ 1 µ} + µ T Σ 1 x 1 2 xt Σ 1 x expg + h T x 1 2 xt Kx Alternative to the conventional and intuitive moment form. Here we represent the potential by the polynomial coefficients h and K. Coefficients h and K as natural parameters. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 12

14 Canonical and Moment parametrisations The moment parameters and canonical parameters are related by K Σ 1 h Σ 1 µ g log α 1 2 log 2πΣ 1 2 µt Σ 1 ΣΣ 1 µ log α log K 2π 1 2 ht K 1 h Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 13

15 Jointly Gaussian Vectors Moment form µ1 Σ11 Σ φx 1, x 2 αn, 12 µ 2 Σ 21 Σ 22 φ α 2πΣ 1 2 exp Σ x1 µ 1 x 2 µ 11 Σ 12 x1 µ 1 2 Σ 21 Σ 22 x 2 µ 2 Canonical form φx 1, x 2 expg + h 1 h 2 x 1 x K x1 x 11 K 12 x1 2 K 21 K 22 x 2 need to find a parametric representation of K Σ 1 in terms of the partitions Σ 11, Σ 12, Σ 21, Σ 22. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 14

16 Partitioned Matrix Inverse Strategy: We will find two matrices X and Z such that W becomes block diagonal. LΣR W Σ L 1 W R 1 Σ 1 RW 1 L K Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 15

17 Add a multiple of row s to row t Gauss Transformations Premultiply Σ with Ls, t where L i,j s, t 1, i j γ, i s and j t 0, o/w Example: s 2, t 1 1 γ 0 1 a b c d a + γc b + γd c d The inverse just subtracts what is added 1 γ γ 0 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 16

18 Gauss Transformations Given Σ, add a multiple of column s to column t Postmultiply Σ with Rs, t where R i,j s, t 1, i j γ, j s and i t 0, o/w Example: s 2, t 1 a b c d 1 0 γ 1 a + γb b c + γd d Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 17

19 Scalar example Σ LΣ LΣR a b c d 1 bd 1 a b a bd 1 c b bd 1 d 0 1 c d c d 1 bd 1 a b c d d 1 c 1 a bd 1 c c d d 1 c 1 a bd 1 c 0 a bd c dd 1 1 c 0 W c d 0 d Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 18

20 Scalar example cont Σ L 1 W R 1 Σ 1 RW 1 L 1 0 d 1 c 1 a bd 1 c bd 1 0 d a bd 1 c bd 1 d 1 ca bd 1 c 1 d a bd 1 c 1 a bd 1 c 1 bd 1 d 1 ca bd 1 c 1 d 1 + d 1 ca bd 1 c 1 bd 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 19

21 Scalar example We could also use Σ LΣ LΣR RW 1 L a b c d 1 0 a b ca 1 1 c d 1 0 a b 1 a 1 b ca 1 1 c d 0 1 a 0 0 d ca 1 W b a 1 + a 1 b d ca 1 b 1 ca 1 a 1 b d ca 1 b 1 d ca 1 b 1 ca 1 d ca 1 b 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 20

22 Partitioned Matrix Inverse In matrix case, this leads to following dual factorizations of Σ as Σ I Σ12 Σ I I 0 Σ 21 Σ 1 11 I Σ11 Σ 12 Σ 22 1 Σ 21 0 I 0 0 Σ 22 Σ 1 22 Σ 21 I Σ11 0 I Σ 1 0 Σ 22 Σ 21 Σ Σ 12 Σ 12 0 I Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 21

23 We will introduce the notation The Schur Complement Σ/Σ 22 Σ 11 Σ 12 Σ 22 1 Σ 21 Σ/Σ 11 Σ 22 Σ 21 Σ 11 1 Σ 12 Determinant Σ Σ/Σ 11 Σ 11 Σ/Σ 22 Σ 22 Σ 1 I 0 Σ 1 22 Σ 21 I I Σ 1 11 Σ 12 0 I Σ/Σ Σ 1 22 Σ Σ/Σ 11 1 I Σ12 Σ I I 0 Σ 21 Σ 1 11 I Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 22

24 Partitioned Matrix Inverse K11 K 12 K 21 K 22 1 Σ11 Σ 12 Σ 21 Σ 22 Σ/Σ 22 1 Σ/Σ 22 1 Σ 12 Σ 1 22 Σ 1 22 Σ 21Σ/Σ 22 1 Σ Σ 1 22 Σ 21Σ/Σ 22 1 Σ 12 Σ 1 22 Σ Σ 1 11 Σ 12Σ/Σ 11 1 Σ 21 Σ 1 11 Σ 1 11 Σ 12Σ/Σ 11 1 Σ/Σ 11 1 Σ 21 Σ 1 11 Σ/Σ 11 1 Quite complicated looking formulas, but straightforward to implement Caution: Σ 1 11 K 11 in general! Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 23

25 Matrix Inversion Lemma Read the diagonal entries Σ 11 Σ 12 Σ 22 1 Σ 21 1 Σ Σ 1 11 Σ 12Σ/Σ 11 1 Σ 21 Σ 1 11 A BC 1 D 1 A 1 + A 1 B C DA 1 B 1 DA 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 24

26 Factorisation of Multivariate Gaussians Consider the joint distribution over the variable x x1 x 2 where the joint distribution is Gaussian px N x; µ, Σ with µ Σ µ1 µ 2 Σ1 Σ 12 Σ 12 Σ 2 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 25

27 Find the following Factorisation of Multivariate Gaussians 1. Conditionals a px 1 x 2 b px 2 x 1 2. Marginals a px 1 b px 2 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 26

28 Factorisation of Multivariate Gaussians Using the partitioned inverse equations, we rearrange px 1, x 2 exp 1 2 x1 µ 1 x 2 µ 2 Σ1 Σ 12 Σ 12 Σ 2 1 x1 µ 1 x 2 µ 2 bring the expression in form of px 1 px 2 x 1 or px 2 px 1 x 2 where the marginal and conditional can be easily identified. See also Bishop, section 2.3. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 27

29 Factorisation of Multivariate Gaussians We have the two decompositions Σ1 Σ 12 Σ 12 Σ 2 1 Σ1 Σ 12 Σ 1 Σ 1 2 Σ 12 Σ Σ 1 1 Σ 12 2 Σ 12 Σ1 Σ 12 Σ 1 2 Σ 12 Σ2 Σ 12Σ 1 Σ 2 Σ 12Σ 1 1 Σ 12 1 Σ 1 Σ 12 Σ 1 1 Σ Σ 1 2 Σ 12 1 Σ 12 Σ 1 1 Σ 12 1 Σ 12 Σ Σ Σ 12 Σ1 Σ 12 Σ 1 2 Σ 1 Σ 12 Σ2 Σ2 Σ 12Σ We let s i x i µ i and use the first decomposition. ps 1, s 2 exp 1 2 s1 s 2 Σ1 Σ 12 Σ 12 Σ 2 1 s1 s 2 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 28

30 exp 1 2 exp 1 2 s 1 s 2 Σ 1 2 Σ 12 s1 1 2 s 2 Σ 1 2 Σ s 2 Σ 1 2 s 2 s 2 Σ1 Σ 12 Σ 1 Σ 1 2 Σ 12 1 Σ 1 Σ 12 Σ Σ s1 1 Σ 1 Σ 12 Σ Σ s1 Σ 1 Σ 12 Σ 1 2 Σ 12 2 Σ 12 Σ1 Σ 12 Σ 1 2 Σ 12 1 Σ12 Σ 1 2 s 2 N s 1 ; Σ 12 Σ 1 2 s 2, Σ 1 Σ 12 Σ 1 2 Σ 12 N s 2; 0, Σ 2 1 Σ 1 Σ 12 Σ 1 1 Σ Σ 1 2 Σ 12 N x 1 ; µ 1 + Σ 12 Σ 1 2 x 2 µ 2, Σ 1 Σ 12 Σ 1 2 Σ 12 N x 2; µ 2, Σ Σ Σ12 Σ 1 Σ1 Σ 12 Σ 1 2 Σ 12 This leads to a factorisation of form px 2 px 1 x 2. The second decomposition will lead to the other factorisation px 1 px 2 x 1. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 29

31 Keywords Summary Multivariate Bernoulli Multivariate Gaussian Partitioned Inverse Equations, Matrix Inversion Lemma Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 30

32 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 31

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

The generative approach to classification. A classification problem. Generative models CSE 250B

The generative approach to classification. A classification problem. Generative models CSE 250B The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Multivariate Gaussians. Sargur Srihari

Multivariate Gaussians. Sargur Srihari Multivariate Gaussians Sargur srihari@cedar.buffalo.edu 1 Topics 1. Multivariate Gaussian: Basic Parameterization 2. Covariance and Information Form 3. Operations on Gaussians 4. Independencies in Gaussians

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Probability Distributions

Probability Distributions Probability Distributions Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr 1 / 25 Outline Summarize the main properties of some of

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An

More information

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Machine Learning - MT Classification: Generative Models

Machine Learning - MT Classification: Generative Models Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of Oxford October 31, 2016 Announcements Practical 1 Submission Try to get signed off during session itself Otherwise,

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

TAMS39 Lecture 2 Multivariate normal distribution

TAMS39 Lecture 2 Multivariate normal distribution TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution

More information

Econ Slides from Lecture 8

Econ Slides from Lecture 8 Econ 205 Sobel Econ 205 - Slides from Lecture 8 Joel Sobel September 1, 2010 Computational Facts 1. det AB = det BA = det A det B 2. If D is a diagonal matrix, then det D is equal to the product of its

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

Lecture 1 October 9, 2013

Lecture 1 October 9, 2013 Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

(Multivariate) Gaussian (Normal) Probability Densities

(Multivariate) Gaussian (Normal) Probability Densities (Multivariate) Gaussian (Normal) Probability Densities Carl Edward Rasmussen, José Miguel Hernández-Lobato & Richard Turner April 20th, 2018 Rasmussen, Hernàndez-Lobato & Turner Gaussian Densities April

More information

Chapter 1: Systems of Linear Equations and Matrices

Chapter 1: Systems of Linear Equations and Matrices : Systems of Linear Equations and Matrices Multiple Choice Questions. Which of the following equations is linear? (A) x + 3x 3 + 4x 4 3 = 5 (B) 3x x + x 3 = 5 (C) 5x + 5 x x 3 = x + cos (x ) + 4x 3 = 7.

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

1.1 Basic Algebra. 1.2 Equations and Inequalities. 1.3 Systems of Equations

1.1 Basic Algebra. 1.2 Equations and Inequalities. 1.3 Systems of Equations 1. Algebra 1.1 Basic Algebra 1.2 Equations and Inequalities 1.3 Systems of Equations 1.1 Basic Algebra 1.1.1 Algebraic Operations 1.1.2 Factoring and Expanding Polynomials 1.1.3 Introduction to Exponentials

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Exam 2. Jeremy Morris. March 23, 2006

Exam 2. Jeremy Morris. March 23, 2006 Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

component risk analysis

component risk analysis 273: Urban Systems Modeling Lec. 3 component risk analysis instructor: Matteo Pozzi 273: Urban Systems Modeling Lec. 3 component reliability outline risk analysis for components uncertain demand and uncertain

More information

Let X and Y denote two random variables. The joint distribution of these random

Let X and Y denote two random variables. The joint distribution of these random EE385 Class Notes 9/7/0 John Stensby Chapter 3: Multiple Random Variables Let X and Y denote two random variables. The joint distribution of these random variables is defined as F XY(x,y) = [X x,y y] P.

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Properties and approximations of some matrix variate probability density functions

Properties and approximations of some matrix variate probability density functions Technical report from Automatic Control at Linköpings universitet Properties and approximations of some matrix variate probability density functions Karl Granström, Umut Orguner Division of Automatic Control

More information

1 Bayesian Linear Regression (BLR)

1 Bayesian Linear Regression (BLR) Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,

More information

Maximum likelihood estimation

Maximum likelihood estimation Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Recitation 3 1 Gaussian Graphical Models: Schur s Complement Consider

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors EE401 (Semester 1) 5. Random Vectors Jitkomut Songsiri probabilities characteristic function cross correlation, cross covariance Gaussian random vectors functions of random vectors 5-1 Random vectors we

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models 1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information

Bayesian Inference for the Multivariate Normal

Bayesian Inference for the Multivariate Normal Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate

More information

CS340 Machine learning Gaussian classifiers

CS340 Machine learning Gaussian classifiers CS340 Machine learning Gaussian classifiers 1 Correlated features Height and weight are not independent 2 Multivariate Gaussian Multivariate Normal (MVN) N(x µ,σ) def 1 (2π) p/2 Σ 1/2exp[ 1 2 (x µ)t Σ

More information

Expectation Propagation in Factor Graphs: A Tutorial

Expectation Propagation in Factor Graphs: A Tutorial DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

2.3. The Gaussian Distribution

2.3. The Gaussian Distribution 78 2. PROBABILITY DISTRIBUTIONS Figure 2.5 Plots of the Dirichlet distribution over three variables, where the two horizontal axes are coordinates in the plane of the simplex and the vertical axis corresponds

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

COMPUTATIONAL MULTI-POINT BME AND BME CONFIDENCE SETS

COMPUTATIONAL MULTI-POINT BME AND BME CONFIDENCE SETS VII. COMPUTATIONAL MULTI-POINT BME AND BME CONFIDENCE SETS Until now we have considered the estimation of the value of a S/TRF at one point at the time, independently of the estimated values at other estimation

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

CS 559: Machine Learning Fundamentals and Applications 2 nd Set of Notes

CS 559: Machine Learning Fundamentals and Applications 2 nd Set of Notes 1 CS 559: Machine Learning Fundamentals and Applications 2 nd Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Overview

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations Chapter 1: Systems of linear equations and matrices Section 1.1: Introduction to systems of linear equations Definition: A linear equation in n variables can be expressed in the form a 1 x 1 + a 2 x 2

More information

Multivariate Gaussian Analysis

Multivariate Gaussian Analysis BS2 Statistical Inference, Lecture 7, Hilary Term 2009 February 13, 2009 Marginal and conditional distributions For a positive definite covariance matrix Σ, the multivariate Gaussian distribution has density

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Alternative Parameterizations of Markov Networks. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models Features (Ising,

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

7.5 Operations with Matrices. Copyright Cengage Learning. All rights reserved.

7.5 Operations with Matrices. Copyright Cengage Learning. All rights reserved. 7.5 Operations with Matrices Copyright Cengage Learning. All rights reserved. What You Should Learn Decide whether two matrices are equal. Add and subtract matrices and multiply matrices by scalars. Multiply

More information

{ p if x = 1 1 p if x = 0

{ p if x = 1 1 p if x = 0 Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

CMPE 547 Bayesian Statistics and Machine Learning

CMPE 547 Bayesian Statistics and Machine Learning CMPE 547 Bayesian Statistics and Machine Learning A. Taylan Cemgil Dept. of Computer Engineering Boğaziçi University Cemgil Bayesian Machine Learning. Spring 2017, Istanbul A simple problem Die 1: λ {,,,,,

More information