Parameter Estimation Industrial AI Lab.
Generative Model X Y w y = ω T x + ε ε~n(0, σ 2 ) σ 2 2
Maximum Likelihood Estimation (MLE) Estimate parameters θ ω, σ 2 given a generative model Given observed data such that maximize the likelihood Generative model structure (assumption) 3
Maximum Likelihood Estimation (MLE) Find parameters ω and σ that maximize the likelihood over the observed data Likelihood: Perhaps the simplest (but widely used) parameter estimation method 4
Drawn from a Gaussian Distribution You will often see the following derivation 5
Drawn from a Gaussian Distribution To maximize, l μ = 0, l σ = 0 BIG Lesson We often compute a mean and variance to represent data statistics We kind of assume that a data set is Gaussian distributed Good news: sample mean is Gaussian distributed by the central limit theorem 6
Numerical Example Compute the likelihood function, then Maximize the likelihood function Adjust the mean and variance of the Gaussian to maximize its product 7
Numerical Example 8
Numerical Example for Gaussian 9
When Mean is Unknown 10
When Variance is Unknown 11
Probabilistic Machine Learning Probabilistic Machine Learning I personally believe this is a more fundamental way of looking at machine learning Maximum Likelihood Estimation (MLE) Maximum a Posterior (MAP) Probabilistic Regression Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 12
Maximum Likelihood Estimation (MLE) 13
Linear Regression: A Probabilistic View Linear regression model with (Gaussian) normal errors 14
Linear Regression: A Probabilistic View BIG Lesson Same as the least squared optimization 15
Linear Regression: A Probabilistic View 16
Linear Regression: A Probabilistic View 17
Linear Regression: A Probabilistic View 18
Linear Regression: A Probabilistic View 19
Linear Regression: A Probabilistic View 20
Maximum a Posterior (MAP) 21
Data Fusion with Uncertainties Learning Theory (Reza Shadmehr, Johns Hopkins University) youtube link X y a y b In a matrix form 22
Data Fusion with Uncertainties Find x ML C T R 1 C 1 C T R 1 23
Data Fusion with Uncertainties 24
Summary Data Fusion with Less Uncertainties BIG Lesson: Two sensors are better than one sensor less uncertainties Accuracy or uncertainty information is also important in sensors σ a 2 = σ b 2 σ a 2 > σ b 2 μ a x ML μ b μ a x ML μ b 25
Example of Two Rulers 1D Examples How brain works on human measurements from both haptic and visual channels 26
Data Fusion with 1D Example 27
Data Fusion with 2D Example 28
Maximum-a-Posterior Estimation (MAP) Choose θ that maximizes the posterior probability of θ (i.e. probability in the light of the observed data) Posterior probability of θ is given by the Bayes Rule P θ : Prior probability of θ (without having seen any data) P D θ : Likelihood P D : Probability of the data (independent of θ ) The Bayes rule lets us update our belief about θ in the light of observed data 29
Maximum-a-Posterior Estimation (MAP) While doing MAP, we usually maximize the log of the posterior probability for multiple observations D = d 1, d 2,, d m same as MLE except the extra log-prior-distribution term MAP allows incorporating our prior knowledge about θ in its estimation 30
MAP for mean of a univariate Gaussian Suppose that θ is a random variable with θ~n μ, 1 2, but a prior knowledge (unknown θ and known μ, σ 2 ) Observations D = d 1, d 2,, d m : conditionally independent given θ Joint Probability 31
MAP for mean of a univariate Gaussian MAP: choose θ MAP 32
MAP for mean of a univariate Gaussian 33
MAP for mean of a univariate Gaussian ML interpretation: BIG Lesson: a prior acts as a data m = 0 m θ MAP μ തX Note: prior knowledge Education Get older School ranking 34
MAP for mean of a univariate Gaussian Example) Experiment in class Which one do you think is heavier? with eyes closed with visual inspection with haptic (touch) inspection 35
MAP Python code Suppose that θ is a random variable with θ~n μ, 1 2, but a prior knowledge (unknown θ and known μ, σ 2 ) for mean of a univariate Gaussian 36
MAP Python code 37
MAP Python code 38
MAP Python code 39
Optional Object Tracking in Computer Vision Lecture: Introduction to Computer Vision by Prof. Aaron Bobick at Georgia Tech 40
Object Tracking in Computer Vision 41
Kernel Density Estimation non-parametric estimate of density Lecture: Learning Theory (Reza Shadmehr, Johns Hopkins University) 42
Kernel Density Estimation 43