ECE285/SIO209, Machine learning for physical applications, Spring 2017

Similar documents
Curve Fitting Re-visited, Bishop1.2.5

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

STA 4273H: Statistical Machine Learning

Recent Advances in Bayesian Inference Techniques

Machine Learning Techniques for Computer Vision

p L yi z n m x N n xi

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati

Machine Learning Overview

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Bayesian Models in Machine Learning

Probabilistic Graphical Models

Bayesian Inference for the Multivariate Normal

STA414/2104 Statistical Methods for Machine Learning II

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Lecture 9: PGM Learning

Multivariate Bayesian Linear Regression MLAI Lecture 11

Variational Mixture of Gaussians. Sargur Srihari

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Introduction to Machine Learning CMU-10701

The Multivariate Gaussian Distribution [DRAFT]

Pattern Recognition and Machine Learning

The Expectation-Maximization Algorithm

Bayesian data analysis in practice: Three simple examples

Introduction to Machine Learning CMU-10701

Density Estimation: ML, MAP, Bayesian estimation

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

Gaussian Models

MIXTURE MODELS AND EM

Learning Bayesian network : Given structure and completely observed data

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

STA 4273H: Statistical Machine Learning

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Naïve Bayes classification

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Linear Dynamical Systems

Stat 5101 Lecture Notes

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

13: Variational inference II

Expectation Maximization

Bayesian Machine Learning

Introduction to Probabilistic Machine Learning

Cheng Soon Ong & Christian Walder. Canberra February June 2018

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

VCMC: Variational Consensus Monte Carlo

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

CV-NP BAYESIANISM BY MCMC. Cross Validated Non Parametric Bayesianism by Markov Chain Monte Carlo CARLOS C. RODRIGUEZ

Bayesian Regression Linear and Logistic Regression

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Part 6: Multivariate Normal and Linear Models

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Bayesian Machine Learning

Inverse Wishart Distribution and Conjugate Bayesian Analysis

Lecture 4: Probabilistic Learning

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA 4273H: Sta-s-cal Machine Learning

10708 Graphical Models: Homework 2

Multivariate Normal & Wishart

Linear Models for Regression

STA 4273H: Statistical Machine Learning

COURSE INTRODUCTION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Probability Theory for Machine Learning. Chris Cremer September 2015

Quantitative Biology II Lecture 4: Variational Methods

Info 2950 Intro to Data Science

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Expectation Propagation Algorithm

Probabilistic clustering

Lecture 2: Priors and Conjugacy

ECE521 lecture 4: 19 January Optimization, MLE, regularization

Bayesian Inference for Normal Mean

Probabilistic Graphical Models

Machine Learning, Fall 2009: Midterm

Introduction to Probabilistic Graphical Models: Exercises

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

Graphical Models for Collaborative Filtering

Introduction to Machine Learning

STA 4273H: Statistical Machine Learning

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

The Expectation-Maximization Algorithm

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Brief Introduction of Machine Learning Techniques for Content Analysis

CSC411 Fall 2018 Homework 5

CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February

An Introduction to Bayesian Linear Regression

COM336: Neural Computing

Bayesian Inference in a Normal Population

Probabilistic Graphical Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Bayesian Methods for Machine Learning

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Classical and Bayesian inference

Clustering using Mixture Models

Probabilistic Graphical Models

Introduction to Machine Learning Midterm Exam Solutions

Transcription:

ECE285/SIO209, Machine learning for physical applications, Spring 2017 Peter Gerstoft, 534-7768, gerstoft@ucsd.edu We meet Wednesday from 5 to 6:20pm in Spies Hall 330 Text Bishop Grading A or maybe S Classes First 4 weeks: Focus on theory/implementation Middle 3 weeks: 50% Applications plus 50% theory Last 3 weeks: 30% Final Project, 30% Applications plus 50% theory Applications Graph theory for source localization: Gerstoft Source tracking in ocean acoustics: Grad Student Emma Reeves Aramco Research: Weichang Li, Group leader Seismic network using mobile phones: Berkeley Eric Orenstein: identifying plankton Plus more - Dictionary learning - Homework: Both matlab/python will be used, Just email the code to me (I dont need anything else). Homework is due 11am on Wednesday. That way we can discuss in class. Hw 1: Tritoned? https://tritoned.ucsd.edu/ matlab/ python/ ipython/ jupyter?

Important quantity in coding theory statistical physics machine learning Entropy

Basic building blocks: Need to determine given Representation: or? Parametric Distributions Recall Curve Fitting

The Gaussian Distribution

Central Limit Theorem The distribution of the sum of N i.i.d. random variables becomes increasingly Gaussian as N grows. Example: N uniform [0,1] random variables.

Geometry of the Multivariate Gaussian

Moments of the Multivariate Gaussian (1) thanks to anti-symmetry of z

Moments of the Multivariate Gaussian (2) A Gaussian requires D*(D-1)/2 +D parameters. Often we use D +D or Just D+1 parameters.

Partitioned Gaussian Distributions

Partitioned Conditionals and Marginals

Partitioned Conditionals and Marginals

Bayes Theorem for Gaussian Variables Given we have where

Maximum Likelihood for the Gaussian (1) Given i.i.d. data, the log likeli-hood function is given by Sufficient statistics

Maximum Likelihood for the Gaussian (2) Set the derivative of the log likelihood function to zero, and solve to obtain Similarly

Maximum Likelihood for the Gaussian (3) Under the true distribution Hence define

Sequential Estimation Contribution of the N th data point, x N correction given x N correction weight old estimate

Bayesian Inference for the Gaussian (1) Assume s 2 is known. Given i.i.d. data the likelihood function for µ is given by This has a Gaussian shape as a function of µ (but it is not a distribution over µ).

Bayesian Inference for the Gaussian (2) Combined with a Gaussian prior over µ, this gives the posterior Completing the square over µ, we see that

Bayesian Inference for the Gaussian (4) Example: for N = 0, 1, 2 and 10. Prior

Bayesian Inference for the Gaussian (5) Sequential Estimation The posterior obtained after observing N { 1 data points becomes the prior when we observe the N th data point.

Bayesian Inference for the Gaussian (6) Now assume µ is known. The likelihood function for l=1/s 2 is given by This has a Gamma shape as a function of l. The Gamma distribution:

Bayesian Inference for the Gaussian (8) Now we combine a Gamma prior,, with the likelihood function for l to obtain which we recognize as with

Bayesian Inference for the Gaussian (9) If both µ and l are unknown, the joint likelihood function is given by We need a prior with the same functional dependence on µ and l.

Bayesian Inference for the Gaussian (10) The Gaussian-gamma distribution Quadratic in µ. Linear in l. Gamma distribution over l. Independent of µ. µ 0 =0, b=2, a=5, b=6

Bayesian Inference for the Gaussian (12) Multivariate conjugate priors µ unknown, L known: p(µ) Gaussian. L unknown, µ known: p(l) Wishart, L and µ unknown: p(µ, L) Gaussian-Wishart,

Mixtures of Gaussians (1) Old Faithful geyser: The time between eruptions has a bimodal distribution, with the mean interval being either 65 or 91 minutes, and is dependent on the length of the prior eruption. Within a margin of error of ±10 minutes, Old Faithful will erupt either 65 minutes after an eruption lasting less than 2 1 2 minutes, or 91 minutes after an eruption lasting more than 2 1 2 minutes. Single Gaussian Mixture of two Gaussians

Mixtures of Gaussians (2) Combine simple models into a complex model: Component Mixing coefficient K=3

Mixtures of Gaussians (3)

Mixtures of Gaussians (4) Determining parameters p, µ, and S using maximum log likelihood Log of a sum; no closed form maximum. Solution: use standard, iterative, numeric optimization methods or the expectation maximization algorithm (Chapter 9).