Parameter Estimation. Industrial AI Lab.

Similar documents
Probabilistic Machine Learning. Industrial AI Lab.

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Lecture : Probabilistic Machine Learning

Mathematical Formulation of Our Example

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Probabilistic Machine Learning

CMU-Q Lecture 24:

Bayesian Machine Learning

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

PATTERN RECOGNITION AND MACHINE LEARNING

Parametric Techniques Lecture 3

ECE521 lecture 4: 19 January Optimization, MLE, regularization

Density Estimation. Seungjin Choi

Parametric Techniques

ECE521 week 3: 23/26 January 2017

Introduction to Bayesian Learning. Machine Learning Fall 2018

Naïve Bayes classification

Machine learning - HT Maximum Likelihood

STA414/2104 Statistical Methods for Machine Learning II

Linear Regression (9/11/13)

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

y Xw 2 2 y Xw λ w 2 2

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Autonomous Navigation for Flying Robots

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

The Bayes classifier

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Probabilistic Graphical Models for Image Analysis - Lecture 1

Bayesian Linear Regression [DRAFT - In Progress]

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

CSC321 Lecture 18: Learning Probabilistic Models

ECE 5984: Introduction to Machine Learning

Statistical learning. Chapter 20, Sections 1 4 1

Bayesian Models in Machine Learning

STA 4273H: Statistical Machine Learning

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Clustering and Gaussian Mixture Models

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Generative v. Discriminative classifiers Intuition

2D Image Processing. Bayes filter implementation: Kalman filter

Bayesian Deep Learning

Bayesian Learning (II)

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

2D Image Processing. Bayes filter implementation: Kalman filter

Introduction to Machine Learning

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Lecture 4: Probabilistic Learning

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Generative v. Discriminative classifiers Intuition

Statistical learning. Chapter 20, Sections 1 3 1

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

CS-E3210 Machine Learning: Basic Principles

COMP90051 Statistical Machine Learning

Massachusetts Institute of Technology

Gibbs Sampling in Linear Models #2

STA 4273H: Sta-s-cal Machine Learning

Bayesian Machine Learning

Machine Learning Gaussian Naïve Bayes Big Picture

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

GAUSSIAN PROCESS REGRESSION

Introduction to Machine Learning

1 Bayesian Linear Regression (BLR)

COS513 LECTURE 8 STATISTICAL CONCEPTS

Machine Learning

Probabilistic Graphical Models

Variational Autoencoders

L11: Pattern recognition principles

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction to Probabilistic Machine Learning

6.867 Machine Learning

Bayesian Methods for Machine Learning

Bias-Variance Tradeoff

Pattern Recognition. Parameter Estimation of Probability Density Functions

Mixture of Gaussians Models

Least Squares Regression

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Recent Advances in Bayesian Inference Techniques

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Machine Learning for Data Science (CS4786) Lecture 12

Probability Theory for Machine Learning. Chris Cremer September 2015

CSCI-567: Machine Learning (Spring 2019)

Neutron inverse kinetics via Gaussian Processes

Mixture Models and EM

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Least Squares Regression

MLE/MAP + Naïve Bayes

Introduction to Machine Learning Midterm, Tues April 8

Transcription:

Parameter Estimation Industrial AI Lab.

Generative Model X Y w y = ω T x + ε ε~n(0, σ 2 ) σ 2 2

Maximum Likelihood Estimation (MLE) Estimate parameters θ ω, σ 2 given a generative model Given observed data such that maximize the likelihood Generative model structure (assumption) 3

Maximum Likelihood Estimation (MLE) Find parameters ω and σ that maximize the likelihood over the observed data Likelihood: Perhaps the simplest (but widely used) parameter estimation method 4

Drawn from a Gaussian Distribution You will often see the following derivation 5

Drawn from a Gaussian Distribution To maximize, l μ = 0, l σ = 0 BIG Lesson We often compute a mean and variance to represent data statistics We kind of assume that a data set is Gaussian distributed Good news: sample mean is Gaussian distributed by the central limit theorem 6

Numerical Example Compute the likelihood function, then Maximize the likelihood function Adjust the mean and variance of the Gaussian to maximize its product 7

Numerical Example 8

Numerical Example for Gaussian 9

When Mean is Unknown 10

When Variance is Unknown 11

Probabilistic Machine Learning Probabilistic Machine Learning I personally believe this is a more fundamental way of looking at machine learning Maximum Likelihood Estimation (MLE) Maximum a Posterior (MAP) Probabilistic Regression Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 12

Maximum Likelihood Estimation (MLE) 13

Linear Regression: A Probabilistic View Linear regression model with (Gaussian) normal errors 14

Linear Regression: A Probabilistic View BIG Lesson Same as the least squared optimization 15

Linear Regression: A Probabilistic View 16

Linear Regression: A Probabilistic View 17

Linear Regression: A Probabilistic View 18

Linear Regression: A Probabilistic View 19

Linear Regression: A Probabilistic View 20

Maximum a Posterior (MAP) 21

Data Fusion with Uncertainties Learning Theory (Reza Shadmehr, Johns Hopkins University) youtube link X y a y b In a matrix form 22

Data Fusion with Uncertainties Find x ML C T R 1 C 1 C T R 1 23

Data Fusion with Uncertainties 24

Summary Data Fusion with Less Uncertainties BIG Lesson: Two sensors are better than one sensor less uncertainties Accuracy or uncertainty information is also important in sensors σ a 2 = σ b 2 σ a 2 > σ b 2 μ a x ML μ b μ a x ML μ b 25

Example of Two Rulers 1D Examples How brain works on human measurements from both haptic and visual channels 26

Data Fusion with 1D Example 27

Data Fusion with 2D Example 28

Maximum-a-Posterior Estimation (MAP) Choose θ that maximizes the posterior probability of θ (i.e. probability in the light of the observed data) Posterior probability of θ is given by the Bayes Rule P θ : Prior probability of θ (without having seen any data) P D θ : Likelihood P D : Probability of the data (independent of θ ) The Bayes rule lets us update our belief about θ in the light of observed data 29

Maximum-a-Posterior Estimation (MAP) While doing MAP, we usually maximize the log of the posterior probability for multiple observations D = d 1, d 2,, d m same as MLE except the extra log-prior-distribution term MAP allows incorporating our prior knowledge about θ in its estimation 30

MAP for mean of a univariate Gaussian Suppose that θ is a random variable with θ~n μ, 1 2, but a prior knowledge (unknown θ and known μ, σ 2 ) Observations D = d 1, d 2,, d m : conditionally independent given θ Joint Probability 31

MAP for mean of a univariate Gaussian MAP: choose θ MAP 32

MAP for mean of a univariate Gaussian 33

MAP for mean of a univariate Gaussian ML interpretation: BIG Lesson: a prior acts as a data m = 0 m θ MAP μ തX Note: prior knowledge Education Get older School ranking 34

MAP for mean of a univariate Gaussian Example) Experiment in class Which one do you think is heavier? with eyes closed with visual inspection with haptic (touch) inspection 35

MAP Python code Suppose that θ is a random variable with θ~n μ, 1 2, but a prior knowledge (unknown θ and known μ, σ 2 ) for mean of a univariate Gaussian 36

MAP Python code 37

MAP Python code 38

MAP Python code 39

Optional Object Tracking in Computer Vision Lecture: Introduction to Computer Vision by Prof. Aaron Bobick at Georgia Tech 40

Object Tracking in Computer Vision 41

Kernel Density Estimation non-parametric estimate of density Lecture: Learning Theory (Reza Shadmehr, Johns Hopkins University) 42

Kernel Density Estimation 43