CS281A/Stat241A Lecture 17
|
|
- Clarissa Ross
- 5 years ago
- Views:
Transcription
1 CS281A/Stat241A Lecture 17 p. 1/4 CS281A/Stat241A Lecture 17 Factor Analysis and State Space Models Peter Bartlett
2 CS281A/Stat241A Lecture 17 p. 2/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian factors plus observations. Parameter estimation with EM. The vec operator. Motivation: Natural parameters of Gaussian. Linear functions of matrices as inner products. Kronecker product. State Space Models. Linear dynamical systems, gaussian disturbances. All distributions are Gaussian: parameters suffice.
3 CS281A/Stat241A Lecture 17 p. 3/4 Factor Analysis: Definition X R p, factors Y R d, observations Local conditionals: p(x) = N(x 0,I), p(y x) = N(y µ + Λx, Ψ).
4 CS281A/Stat241A Lecture 17 p. 4/4 Factor Analysis λ 2 Λx λ1 µ
5 CS281A/Stat241A Lecture 17 p. 5/4 Factor Analysis: Definition Local conditionals: p(x) = N(x 0,I), p(y x) = N(y µ + Λx, Ψ). The mean of y is µ R d. The matrix of factors is Λ R d p. The noise covariance Ψ R d d is diagonal. Thus, there are d + dp + d dp d 2 parameters.
6 CS281A/Stat241A Lecture 17 p. 6/4 Factor Analysis: Marginals, Conditionals Theorem 1. Y N(µ, ΛΛ + Ψ). (( ) 0 2. (X,Y ) N µ, Σ ), with Σ = ( I Λ Λ ΛΛ + Ψ ). 3. p(x y) is Gaussian, with mean = Λ (ΛΛ + Ψ) 1 (y µ), covariance I Λ (ΛΛ + Ψ) 1 Λ.
7 CS281A/Stat241A Lecture 17 p. 7/4 Factor Analysis: Parameter Estimation iid data y = (y 1,...,y n ). The log likelihood is l(θ;y) = log p(y θ) = const n 2 log ΛΛ + Ψ 1 (y i µ) (ΛΛ + Ψ) 1 (y i µ). 2 i
8 CS281A/Stat241A Lecture 17 p. 8/4 Factor Analysis: Parameter Estimation Let s first consider estimation of µ: ˆµ ML = 1 n n i=1 y i, as for the full covariance case. From now on, let s assume µ = 0, so we can ignore it: l(θ;y) = const 1 2 log ΛΛ + Ψ 1 2 y i(λλ + Ψ) 1 y i. But how do we find a factorized covariance matrix, Σ = ΛΛ + Ψ? i
9 CS281A/Stat241A Lecture 17 p. 9/4 Factor Analysis: EM We follow the usual EM recipe: 1. Write out the complete log likelihood, l c. 2. E step: Calculate E[l c y]. Typically, find E[suff. stats y]. 3. M step: Maximize E[l c y].
10 CS281A/Stat241A Lecture 17 p. 10/4 Factor Analysis: EM 1. Write out the complete log likelihood, l c. l c (θ) = log(p(x,θ)p(y x,θ)) n = const n 2 log Ψ i=1 x ix i n (y i Λx i ) Ψ 1 (y i Λx i ). i=1
11 CS281A/Stat241A Lecture 17 p. 11/4 Factor Analysis: EM 2. E step: Calculate E[l c y]. Typically, find E[suff. stats y]. Claim: Sufficient statistics are x i, x i x i. Indeed, E[l c y] is a constant plus n 2 log Ψ plus ( 1 n ) 2 E n x ix i + (y i Λx i ) Ψ 1 (y i Λx i ) y = 1 2 = 1 2 i=1 i=1 n ( E(x i x i y i ) + E ( tr ( (y i Λx i ) Ψ 1 (y i Λx i ) ) )) y i i=1 n E(x i x i y i ) n 2 E( tr ( SΨ 1) ) y i, i=1
12 CS281A/Stat241A Lecture 17 p. 12/4 Factor Analysis: EM. E-step E[l c y] = n 2 log Ψ 1 2 with S = 1 n n E(x ix i y i ) n 2 E( tr ( SΨ 1) ) y i, i=1 n (y i Λx i )(y i Λx i ), i=1 We used the fact that the trace (sum of diagonal elements) of a matrix satisfies tr(abc) = tr(cab), as long as the products ABC and CAB are square.
13 CS281A/Stat241A Lecture 17 p. 13/4 Factor Analysis: EM. E-step Now, we can calculate E(S y) = 1 n = 1 n n i=1 n i=1 E [ y i y i 2Λx iy i + Λx ix i Λ y i ] y i y i 2ΛE[x i y i ]y i + ΛE[x i x i y i ]Λ, and from this it is clear that the expected sufficient statistics are E(x i y i ), E(x i x i y i) (and its trace, E(x i x i y i )).
14 CS281A/Stat241A Lecture 17 p. 14/4 Factor Analysis: EM. E-step We calculated these conditional expectations earlier: E[x i y i ] = Λ ( ΛΛ + Ψ ) 1 (yi µ) Var[x i y i ] = I Λ ( ΛΛ + Ψ ) 1 Λ E[x i x i y i ] = Var[x i y i ] + E[x i y i ]E[x i y i ]. So we can plug them in to calculate the expected complete log likelihood.
15 CS281A/Stat241A Lecture 17 p. 15/4 Factor Analysis: EM. M-step 3. M step: Maximize E[l c y]. For Λ, this is equivalent to minimizing ntr ( E(S y)ψ 1) = tr ( (Y ΛX )(Y ΛX ) Ψ 1), where Y R n d, with rows y i, X R n p, with rows x i. This is a matrix version of linear regression, with the d separate components of the squared error weighted by one of the diagonal entries in Ψ 1. Thus, the Ψ matrix plays no role, and the solution satisfies the normal equations.
16 CS281A/Stat241A Lecture 17 p. 16/4 Factor Analysis: EM. M-step Normal Equations: ˆΛ = ( n E[x i x i y i ] ) 1 n ( E[xi y i ]y i) i=1 i=1 (They are the same sufficient statistics as in linear regression; here we need to compute the expectation of the suff. stats given the observations.)
17 CS281A/Stat241A Lecture 17 p. 17/4 Factor Analysis: EM. M-step For Ψ, we need to find a diagonal Ψ to minimize log Ψ + tr ( E(S y)ψ 1). = d j=1 (log ψ j + s j /ψ j ), It s easy to check that this is minimized for ψ j = s j, the diagonal entries of E[S y].
18 CS281A/Stat241A Lecture 17 p. 18/4 Factor Analysis: EM. Summary. E step: Calculate the expected suff. stats: E[x i y i ] = Λ ( ΛΛ + Ψ ) 1 (yi µ) Var[x i y i ] = I Λ ( ΛΛ + Ψ ) 1 Λ E[x i x i y i ] = Var[x i y i ] + E[x i y i ]E[x i y i ]. And use these to compute the (diagonal entries of the) matrix E(S y) = 1 n n i=1 y i y i 2ΛE[x i y i ]y i + ΛE[x ix i y i]λ.
19 CS281A/Stat241A Lecture 17 p. 19/4 Factor Analysis: EM. Summary. M step: Maximize E[l c y]: ( n ) 1 n ˆΛ = E[x i x i y i ] i=1 i=1 ˆΨ = diag(e(s y)), ( E[xi y i ]y i) and recall: ˆµ = 1 n n i=1 y i.
20 CS281A/Stat241A Lecture 17 p. 20/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian factors plus observations. Parameter estimation with EM. The vec operator. Motivation: Natural parameters of Gaussian. Linear functions of matrices as inner products. Kronecker product. State Space Models. Linear dynamical systems, gaussian disturbances. All distributions are Gaussian: parameters suffice. Inference: Kalman filter and smoother. Parameter estimation with EM.
21 CS281A/Stat241A Lecture 17 p. 21/4 The vec operator: Motivation Consider the multivariate Gaussian: ( p(x) = (2π) (d)/2 Σ 1/2 exp 1 ) 2 (x µ) Σ 1 (x µ). What is the natural parameterization? p(x) = h(x) exp ( θ T(x) A(θ) ).
22 CS281A/Stat241A Lecture 17 p. 22/4 The vec operator: Motivation If we define then we can write Λ = Σ 1 η = Σ 1 µ, (x µ) Σ 1 (x µ) = µ Σ 1 µ 2µ Σ 1 x + x Σ 1 x = η Λ 1 η 2η x + x Λx. Why is this of the form θ T(x) A(θ)?
23 CS281A/Stat241A Lecture 17 p. 23/4 The vec operator: Example ( x 1 x 2 ) ( λ 11 λ 12 λ 21 λ 22 ) ( x 1 x 2 ) = λ 11 x λ 21 x 1 x 2 + λ 12 x 1 x 2 + λ 22 x 2 2 = ( ) λ 11 λ 21 λ 12 λ 22 = vec(λ) vec(xx ). x 2 1 x 1 x 2 x 2 x 1 x 2 2
24 CS281A/Stat241A Lecture 17 p. 24/4 The vec operator Definition [vec]: For a matrix A, vec(a) = a 1 a 2. a n where the a i are the column vectors of A: ( ) A = a 1 a 2 a n
25 CS281A/Stat241A Lecture 17 p. 25/4 The vec operator Theorem: tr(a B) = vec(a) vec(b). (Trace of the product is the sum of the corresponding row column inner products.) In the example above, x Λx = tr ( x Λx ) = tr ( Λxx ) = vec(λ ) vec(xx ) }{{}}{{} nat. param. suff. stat.
26 CS281A/Stat241A Lecture 17 p. 26/4 The Kronecker product We also use the vec operator for matrix equations like XΘY = Z, or, for instance, for least squares minimization of XΘY Z. Then we can write vec(z) = vec(xθy ) = (Y X)vec(Θ), where (Y X) is the Kronecker product of Y and X:
27 CS281A/Stat241A Lecture 17 p. 27/4 The Kronecker product Definition [Kronecker product] The Kronecker product of A and B is a 11 B a 1n B A B =..... a m1 B a mn B
28 CS281A/Stat241A Lecture 17 p. 28/4 The Kronecker product Theorem [Kronecker product and vec operator] vec(abc) = (C A)vec(B). The Kronecker product and vec operator are used in matrix algebra. They are convenient for differentiation of a function of a matrix, and they arise: in statistical models involving products of features; in systems theory; and in stability theory.
29 CS281A/Stat241A Lecture 17 p. 29/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian factors plus observations. Parameter estimation with EM. The vec operator. Motivation: Natural parameters of Gaussian. Linear functions of matrices as inner products. Kronecker product. State Space Models. Linear dynamical systems, gaussian disturbances. All distributions are Gaussian: parameters suffice. Inference: Kalman filter and smoother. Parameter estimation with EM.
30 CS281A/Stat241A Lecture 17 p. 30/4 State Space Models mixture model discrete factor model Gaussian HMM discrete State Space Model Gaussian
31 CS281A/Stat241A Lecture 17 p. 31/4 State Space Models In linear dynamic systems, The evolution of the state x t, and The relationship between the state x t and the observation y t are linear with Gaussian noise.
32 CS281A/Stat241A Lecture 17 p. 32/4 State Space Models State space models revolutionized control theory in the late 50s and early 60s. Prior to these models, classical control theory could cope with decoupled low order systems. State space models allowed the effective control of complex systems like spacecraft and fast aircraft. The directed graph is identical to an HMM, so the conditional independencies are identical: Given the current state (not observation), the past and the future are independent.
33 CS281A/Stat241A Lecture 17 p. 33/4 Linear System: Definition x t 1 x t x t+1 R p y t 1 y t y t+1 R d
34 CS281A/Stat241A Lecture 17 p. 34/4 Linear System: Definition State x t R p Observation y t R d Initial state x 0 N(0,P 0 ) Dynamics x t+1 = Ax t + Gw t, w t N(0,Q) Observation y t = Cx t + v t, v t N(0,R).
35 CS281A/Stat241A Lecture 17 p. 35/4 Linear System: Observations 1. All the distributions are Gaussian (joints, marginals, conditionals), so they can be described by their means and variances. 2. The conditional distribution of the next state, x t+1 x t, is To see this: N(Ax t,gqg ). E(x t+1 x t ) = Ax t + GE(w t+1 x t ) = Ax t. Var(x t+1 x t ) = E(Gw t (Gw t ) ) = GQG.
36 CS281A/Stat241A Lecture 17 p. 36/4 Linear System: Observations 3. The marginal distribution of x t is N(0,P t ), where P 0 is given and, for t 0, To see this: P t+1 = AP t A + GQG. Ex t+1 = EE(x t+1 x t ) = AE(x t ) = 0. P t+1 = E(x t+1 x t+1) = E((Ax t + Gw t )(Ax t + Gw t ) ) = AP t A + GQG.
37 CS281A/Stat241A Lecture 17 p. 37/4 Inference in SSMs Filtering: p(x t y 0,...,y t ). Smoothing: p(x t y 0,...,y T ). For inference, it suffices to calculate the appropriate conditional means and covariances.
38 CS281A/Stat241A Lecture 17 p. 38/4 Inference in SSMs: Notation ˆx t s = E(x t y 0,...,y s ), ˆP t s = E((x t x t s )(x t x t s ) y 0,...,y s ). Filtering: x t t N(ˆx t t,p t t ), Smoothing: x t T N(ˆx t T,P t T ). The Kalman Filter is an inference algorithm for ˆx t t, P t t. The Kalman Smoother is an inference algorithm for ˆx t T, P t T.
39 CS281A/Stat241A Lecture 17 p. 39/4 The Kalman Filter x t 1 x t x t+1 y t 1 y t y t+1 p(x t y 0,...,y t ) ˆx t t,p t t x t 1 x t x t+1 time update y t 1 y t y t+1 p(x t+1 y 0,...,y t ) ˆx t+1 t,p t+1 t x t 1 x t x t+1 measurement update y t 1 y t y t+1 p(x t+1 y 0,...,y t+1 )ˆx t+1 t+1,p t+1 t+1
40 CS281A/Stat241A Lecture 17 p. 40/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian factors plus observations. Parameter estimation with EM. The vec operator. Motivation: Natural parameters of Gaussian. Linear functions of matrices as inner products. Kronecker product. State Space Models. Linear dynamical systems, gaussian disturbances. All distributions are Gaussian: parameters suffice. Inference: Kalman filter and smoother. Parameter estimation with EM.
Factor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationLecture 6: April 19, 2002
EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1
Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters Lecturer: Drew Bagnell Scribe:Greydon Foil 1 1 Gauss Markov Model Consider X 1, X 2,...X t, X t+1 to be
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x (i) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting,
More informationGaussian Models (9/9/13)
STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationX t = a t + r t, (7.1)
Chapter 7 State Space Models 71 Introduction State Space models, developed over the past 10 20 years, are alternative models for time series They include both the ARIMA models of Chapters 3 6 and the Classical
More informationSummer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.
Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationTAMS39 Lecture 2 Multivariate normal distribution
TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationMachine Learning 4771
Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object
More informationA Bayesian Treatment of Linear Gaussian Regression
A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationChapter 4. Chapter 4 sections
Chapter 4 sections 4.1 Expectation 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation SKIP: 4.8 Utility Expectation
More informationVectors and Matrices Statistics with Vectors and Matrices
Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc
More informationFactor Analysis (10/2/13)
STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationMarkov Chains and Hidden Markov Models
Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationStat 206: Sampling theory, sample moments, mahalanobis
Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationLecture 16: State Space Model and Kalman Filter Bus 41910, Time Series Analysis, Mr. R. Tsay
Lecture 6: State Space Model and Kalman Filter Bus 490, Time Series Analysis, Mr R Tsay A state space model consists of two equations: S t+ F S t + Ge t+, () Z t HS t + ɛ t (2) where S t is a state vector
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationChapter 3: Maximum Likelihood Theory
Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood
More information1 Kalman Filter Introduction
1 Kalman Filter Introduction You should first read Chapter 1 of Stochastic models, estimation, and control: Volume 1 by Peter S. Maybec (available here). 1.1 Explanation of Equations (1-3) and (1-4) Equation
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of
More informationThe Multivariate Gaussian Distribution [DRAFT]
The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More informationChapter 4: Factor Analysis
Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationPetter Mostad Mathematical Statistics Chalmers and GU
Petter Mostad Mathematical Statistics Chalmers and GU Solution to MVE55/MSG8 Mathematical statistics and discrete mathematics MVE55/MSG8 Matematisk statistik och diskret matematik Re-exam: 4 August 6,
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationExam 2. Jeremy Morris. March 23, 2006
Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationReview of Discrete Probability (contd.)
Stat 504, Lecture 2 1 Review of Discrete Probability (contd.) Overview of probability and inference Probability Data generating process Observed data Inference The basic problem we study in probability:
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationRandom Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30
Random Vectors 1 STA442/2101 Fall 2017 1 See last slide for copyright information. 1 / 30 Background Reading: Renscher and Schaalje s Linear models in statistics Chapter 3 on Random Vectors and Matrices
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More information[POLS 8500] Review of Linear Algebra, Probability and Information Theory
[POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming
More informationGaussian random variables inr n
Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b
More informationB4 Estimation and Inference
B4 Estimation and Inference 6 Lectures Hilary Term 27 2 Tutorial Sheets A. Zisserman Overview Lectures 1 & 2: Introduction sensors, and basics of probability density functions for representing sensor error
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationMatrix Algebra, Class Notes (part 2) by Hrishikesh D. Vinod Copyright 1998 by Prof. H. D. Vinod, Fordham University, New York. All rights reserved.
Matrix Algebra, Class Notes (part 2) by Hrishikesh D. Vinod Copyright 1998 by Prof. H. D. Vinod, Fordham University, New York. All rights reserved. 1 Converting Matrices Into (Long) Vectors Convention:
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationNotes on Random Vectors and Multivariate Normal
MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution
More informationECE 541 Stochastic Signals and Systems Problem Set 9 Solutions
ECE 541 Stochastic Signals and Systems Problem Set 9 Solutions Problem Solutions : Yates and Goodman, 9.5.3 9.1.4 9.2.2 9.2.6 9.3.2 9.4.2 9.4.6 9.4.7 and Problem 9.1.4 Solution The joint PDF of X and Y
More informationECE534, Spring 2018: Solutions for Problem Set #3
ECE534, Spring 08: Solutions for Problem Set #3 Jointly Gaussian Random Variables and MMSE Estimation Suppose that X, Y are jointly Gaussian random variables with µ X = µ Y = 0 and σ X = σ Y = Let their
More informationBare minimum on matrix algebra. Psychology 588: Covariance structure and factor models
Bare minimum on matrix algebra Psychology 588: Covariance structure and factor models Matrix multiplication 2 Consider three notations for linear combinations y11 y1 m x11 x 1p b11 b 1m y y x x b b n1
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationConvergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit
Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit Evan Kwiatkowski, Jan Mandel University of Colorado Denver December 11, 2014 OUTLINE 2 Data Assimilation Bayesian Estimation
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationProblem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20
Problem Set MAS 6J/.6J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 0 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain a
More information1 Probability Model. 1.1 Types of models to be discussed in the course
Sufficiency January 18, 016 Debdeep Pati 1 Probability Model Model: A family of distributions P θ : θ Θ}. P θ (B) is the probability of the event B when the parameter takes the value θ. P θ is described
More informationDimension Reduction. David M. Blei. April 23, 2012
Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do
More informationLecture 3. Inference about multivariate normal distribution
Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates
More informationKalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q
Kalman Filter Kalman Filter Predict: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q Update: K = P k k 1 Hk T (H k P k k 1 Hk T + R) 1 x k k = x k k 1 + K(z k H k x k k 1 ) P k k =(I
More information. a m1 a mn. a 1 a 2 a = a n
Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationStatistics 910, #15 1. Kalman Filter
Statistics 910, #15 1 Overview 1. Summary of Kalman filter 2. Derivations 3. ARMA likelihoods 4. Recursions for the variance Kalman Filter Summary of Kalman filter Simplifications To make the derivations
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationEEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as
L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace
More informationMAS223 Statistical Inference and Modelling Exercises
MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More information4 Derivations of the Discrete-Time Kalman Filter
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time
More informationt x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.
Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationRecitation. Shuang Li, Amir Afsharinejad, Kaushik Patnaik. Machine Learning CS 7641,CSE/ISYE 6740, Fall 2014
Recitation Shuang Li, Amir Afsharinejad, Kaushik Patnaik Machine Learning CS 7641,CSE/ISYE 6740, Fall 2014 Probability and Statistics 2 Basic Probability Concepts A sample space S is the set of all possible
More informationECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form
ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER KRISTOFFER P. NIMARK The Kalman Filter We will be concerned with state space systems of the form X t = A t X t 1 + C t u t 0.1 Z t
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationChapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationESTIMATION THEORY. Chapter Estimation of Random Variables
Chapter ESTIMATION THEORY. Estimation of Random Variables Suppose X,Y,Y 2,...,Y n are random variables defined on the same probability space (Ω, S,P). We consider Y,...,Y n to be the observed random variables
More informationLecture 5: Moment Generating Functions
Lecture 5: Moment Generating Functions IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge February 28th, 2018 Rasmussen (CUED) Lecture 5: Moment
More informationLecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel
Lecture Notes 4 Vector Detection and Estimation Vector Detection Reconstruction Problem Detection for Vector AGN Channel Vector Linear Estimation Linear Innovation Sequence Kalman Filter EE 278B: Random
More informationState-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49
State-space Model Eduardo Rossi University of Pavia November 2013 Rossi State-space Model Financial Econometrics - 2013 1 / 49 Outline 1 Introduction 2 The Kalman filter 3 Forecast errors 4 State smoothing
More information1 Bayesian Linear Regression (BLR)
Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More informationLecture 8: Signal Detection and Noise Assumption
ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Lecture 8: Signal Detection and Noise Assumption Signal Detection : X = W H : X = S + W where W N(0, σ I n n and S = [s, s,..., s n ] T
More informationSTAT 430/510: Lecture 16
STAT 430/510: Lecture 16 James Piette June 24, 2010 Updates HW4 is up on my website. It is due next Mon. (June 28th). Starting today back at section 6.7 and will begin Ch. 7. Joint Distribution of Functions
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationMath 152. Rumbos Fall Solutions to Assignment #12
Math 52. umbos Fall 2009 Solutions to Assignment #2. Suppose that you observe n iid Bernoulli(p) random variables, denoted by X, X 2,..., X n. Find the LT rejection region for the test of H o : p p o versus
More informationContinuous Random Variables
1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables
More information