Probabilistic Graphical Models for Image Analysis - Lecture 1

Similar documents
Intelligent Systems I

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Lecture : Probabilistic Machine Learning

Recent Advances in Bayesian Inference Techniques

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Introduction to Bayesian Learning

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Mathematical Formulation of Our Example

CPSC 540: Machine Learning

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

CSC321 Lecture 18: Learning Probabilistic Models

Bayesian Networks and Decision Graphs

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

An Introduction to Bayesian Machine Learning

CSCI-567: Machine Learning (Spring 2019)

Introduction to Bayesian Learning. Machine Learning Fall 2018

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Introduction to Artificial Intelligence. Unit # 11

Bayesian Learning (II)

Naïve Bayes classification

Learning from Data. Amos Storkey, School of Informatics. Semester 1. amos/lfd/

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Introduction to Machine Learning

Notes on Machine Learning for and

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Machine Learning and Related Disciplines

Introduction to Machine Learning Midterm Exam

Probabilistic Graphical Models

COMP 328: Machine Learning

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Unsupervised Learning with Permuted Data

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

Parameter Estimation. Industrial AI Lab.

PATTERN RECOGNITION AND MACHINE LEARNING

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Directed and Undirected Graphical Models

An Introduction to Statistical and Probabilistic Linear Models

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Association studies and regression

Brief Introduction of Machine Learning Techniques for Content Analysis

Probability Theory for Machine Learning. Chris Cremer September 2015

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Linear Models for Classification

Lecture 16 Deep Neural Generative Models

Introduction to Machine Learning Midterm Exam Solutions

Bayesian Machine Learning

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Nonparametric Bayesian Methods (Gaussian Processes)

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Probabilistic Graphical Models (I)

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Based on slides by Richard Zemel

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Statistical Learning Reading Assignments

COURSE INTRODUCTION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

ECE521 week 3: 23/26 January 2017

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Latent Variable Models

Lecture 6: Graphical Models: Learning

Variational Autoencoders

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Lecture 1: Bayesian Framework Basics

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Lecture 9: PGM Learning

Hidden Markov Models

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Bayesian Linear Regression [DRAFT - In Progress]

Parametric Techniques

Day 5: Generative models, structured classification

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

Density Estimation. Seungjin Choi

Computational Genomics

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

Machine Learning Summer School

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Probability Based Learning

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Some Probability and Statistics

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Bayesian Methods: Naïve Bayes

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Parametric Techniques Lecture 3

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Mixture of Gaussians Models

Transcription:

Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems

Overview 1. Motivation - Why Graphical Models 2. Course 3. Probabilistic Modeling 4. Probabilistic Inference 1

Motivation - Why Graphical Models

Neuroscience Manning et al. Topographic Factor Analysis: A Bayesian Model for Inferring Brain Networks from Neural Data, PloS one, 2014. 2

Image Generation Karras et al. Progressive growing of gans for improved quality, stability, and variation.iclr 2018. 3

Genomics Mäkinen et al. Integrative Genomics Reveals Novel Molecular Pathways and Gene Networks for Coronary Artery Disease, PloS genetics, 2014. 4

Robotics Grisetti et al. A tutorial on graph-based SLAM. IEEE Intelligent Transportation Systems Magazine 2010 5

Course

Administrative Information Exam 20 minute oral exam in English Exercises Each Tuesday from 15-16:00 in CAB G56; Exercises will be incorporated into the Lectures. Contact Please ask questions through the open forum using https://piazza.com/. Additional Information http://www.vvz.ethz.ch/vorlesungsverzeichnis/ lerneinheit.view?semkez=2018w&ansicht= KATALOGDATEN&lerneinheitId=126518&lang=en 6

Head TA - Philippe Wenk 7

Exercise Administration Time & Room Tuesday, 15-16, CAB G56 Exercise One exercise sheet per week. Hand it in one week afterwards. We return it to you the week after. No exercise next week! Testat No Testat requirement for the exam. If you are a PhD student you might want to talk to your Studiensekretariat. 8

Related Courses Some overlap with the following courses: Advanced Machine Learning - Prof. Buhmann http://ml2.inf.ethz.ch/courses/ml/ Introduction to Machine Learning - Prof. Krause https://las.inf.ethz.ch/teaching/ introml-s18 Probabilistic Artificial Intelligence - Prof. Krause https: //las.inf.ethz.ch/teaching/pai-f18 Computational Intelligence Lab - Prof. Hofmann http://cil.inf.ethz.ch Deep Learning - Prof. Hofmann http://www.da.inf. ethz.ch/teaching/2017/deeplearning/ Other courses with some overlap are offered by the computer vision group. 9

Reading Online Resources: Barber: http://web4.cs.ucl.ac.uk/staff/d.barber/ pmwiki/pmwiki.php?n=brml.online MacKay http://www.inference.phy.cam.ac.uk/ mackay/itila/book.html Wainwright & Jordan http://www.eecs.berkeley. edu/~wainwrig/papers/waijor08_ftml.pdf Classics: Bishop: Pattern recognition and machine learning Koller and Friedman: Probabilistic graphical models 10

Course outline Learning from data Probabilistic models Graphical models Variational Inference State Space Models Dynamical Systems Factor Analysis Autoencoders Current research 11

Probabilistic Modeling 11

Reasoning under Uncertainty Knowledge Representation Use domain knowledge to design representative models Automated Reasoning Develop a general suite of algorithms for reasoning. Earthquake Burglar Radio Alarm Phonecall Deal with uncertainty inherent in the real world using the notion of probability. 12

Graphical Models Earthquake Burglar Radio Alarm Phonecall Graphical Models provide a compact representation of two equivalent perspectives: representation of a set of independencies factorization of the joint distribution 13

Algorithms Representation What do the different edges mean and how these can be used to model tasks in image analysis. Learning Inference Given some empirical measurements (data), estimate the parameters of the model which best explains it. For a given model, how to use it for making decisions and reasoning about the task at hand. 14

To do list for modeling Representation Develop techniques to represent dependencies between variables using graphical models. Learning Inference Given some empirical measurements (x, y), estimate the parameters of the model (θ ML, θ MAP,...) which best explains them. For a given model (θ), investigate various ways of using it to make predictions, de-noise images, etc. 15

Machine Learning Model the machine learning task as the joint probability P(x, y). example x i X for example a set of images. label y i Y for example whether the digit is 1,3,4 or 8. Training Data Consists of examples and associated labels, which are used to train the machine. Testing Data Consists of examples and associated labels. Labels are never used to train machine, only to evaluate its performance Predictions Output of the trained machine. 16

Supervised learning Let X = {set of d d images} and Y = {0, 1,..., 9} = {set of labelings}. Problem: Given labeled training data find a function D = { (X 1, Y 1 ),..., (X n, Y n ) } X n Y n, f : }{{} X }{{} Y images labels that correctly classifies all images including new ones. 17

Learning a function Suppose we have a class of functions F = {f θ : X Y θ Θ}. Empirical risk minimization Define [ ] θ := argmin 1[f E P θ (x i ) y i ] θ Θ [ ] How to interpret 1[f E P θ (x i ) y i ]? Number of incorrect classifications made by f θ on the training data. Empirical risk minimization finds the function f θ that makes the fewest mistakes on the training data. 18

Linear Regression Consider the problem of linear regression, y = θ x + ε where the loss is the mean squared error: min θ n (θ x i y i ) 2 i=1 1. How are examples represented? 2. How are labels represented? 3. What does the machine estimate? 4. What does the machine output? 19

Unsupervised learning Problem: Given class of models P(X; θ) for θ Θ and unlabeled data D = (x 1,..., x n ) X n sampled i.i.d. from unknown distribution P(X), find the model θ that best fits the data. In other words, want P(X; θ) P(X) or, better, P(X; θ) P(X). 20

The likelihood Suppose we have a class of probability distributions M = {P(x θ) θ Θ} on X. The probability of the observations taken as a function where x is fixed and θ varies is called the likelihood. L(θ x) := P(x θ), L(θ x): how likely is parameter θ to have generated x? or how well does θ explain x? 21

Learning a model Maximum likelihood estimation (ML) Given data D = (x 1,..., x n ) X n, let N L(θ D) := P(D θ) := P(x i θ) i=1 be the likelihood of the parameters given the data. Note: Assume datapoints are independently, identically distributed (i.i.d.) given θ! Goal Find the parameter θ that is most likely to have generated the data D = (v 1,..., v N ). Definition: the maximum likelihood estimator (MLE) is θ ML = arg max θ Θ L(θ D). 22

The negative log-likelihood We usually minimize the negative Log Likelihood log P(x θ) since, Logarithm is monotonic: arg min f(x) = arg min log f(x) {minimize negative log-likelihood} {maximize likelihood} Simplifies math: log i f(x i) = i log f(x i) Better numerical behavior: Relation to empirical risk minimization (Exercise:) Show computing the MLE minimizing empirical risk for l(f, x) = log f(x). 23

Probabilistic Inference

(Conditional) Independence of RVs Independence of Random Variables RVs X, Y are independent iff p(x, y) = p(x)p(y). I.e.: In a simultaneous draw, the value taken by X does not depend on the value taken by Y and vice versa. Conditional Independence of Random Variables RVs X, Y are conditionally independent given Z, iff p(x, y z) = p(x z)p(y z). This however does not imply independence of the RVs X and Y. 24

Marginalization and its exponential running time Marginalization Assume a joint distribution of RVs X 1,..., X N is given p(x 1,..., x N ). Often we re interested in marginals of RVs X S, where S is a subset of indices of the variable indices. The marginal can be computed as p(x S ) = p(x 1,..., x N ) x {1,...,N} S Summation is expensive Assume each X can take K different values, and S = {1}. A full summation will have K N 1 summands, i.e. running time exponential in the number of variables! 25

Expressing Joint distributions Example: Autonomous shooting device Assume we have a simple model for a self-shooting device with three variables: X burglar, X alarm, X shoot each taking values in {0, 1}, the joint probability can be specified by giving the probability for each configuration P(x bu, x al, x sh ). Joint distribution We can specify a joint distribution as follows: X burglar X alarm X shoot P( ) 0 0 0 0.576 0 0 1 0.144 0 1 0 0.024 0 1 1 0.056 1 0 0 0.032 1 0 1 0.008 1 1 0 0.048 1 1 1 0.112 26

Graphical Models Say we have the following graphical model for the previous example: X bu X al X sh Figure 1: Illustrating the joint distribution by a graphical model. Here the arrows show conditional dependencies. Again: you can not conclude that X sh is independent of X bu only because they are not connected. 27

Factorising joint distributions Incorporating Independence From the graphical model we can write the joint as: p(x bu, x al, x sh ) = p(x sh x al )p(x al x bu )p(x bu ) Alternative specification of the joint distribution Use conditional probability tables according to the dependencies. X bu P(X bu ) 0 0.8 1 0.2 X bu X al P(X al X bu ) 0 0 0.9 0 1 0.1 1 0 0.2 1 1 0.8 X al X sh P(X sh X al ) 0 0 0.8 0 1 0.2 1 0 0.3 1 1 0.7 28

Elimination Algorithm In the previous example we might be interested in the probability of the shooting device shooting or not shooting some target, i.e. P(x sh ) : P(x sh ) = x al,x bu P(x bu, x al, x sh ) We can use the factorization from above: P(x sh ) = P(x sh x al )P(x al x bu )P(x bu ) x al,x bu We can push in the sum: P(x sh ) = x al P(x sh x al ) x bu P(x al x bu )P(x bu ). For a larger chain this makes a huge difference computationally: O(2 N ) vs. O(N), with N the number of variables. 29

Inference in Bayesian Networks (Williams, 2006 and Barber, 2012) Rain Sprinkler Watson Holmes Interesting questions: If Holmes lawn is wet, was it the rain or the sprinkler? If Watson s lawn is also wet how does this change things? To answer these question we must do inference. Lets plug some numbers in... 30

Why Bayesian Networks? Rain Sprinkler Holmes Recall: Bayes Theorem In our example: posterior likelihood prior prior (probability of the events occuring) p(r = yes, s = yes) = p(r = yes)p(s = yes) "likelihood" (observation model) p(h = yes r, s) posterior (conditioning on observed evidence) p(s h = yes), p(r h = yes) 31

Next week & Open Master s thesis topics Open master s thesis topics Brain Connectivity Estimation a Online Outlier Detection Sleep Classification Model Selection for Dynamical Systems Simulator for Robotics Platform Large-Scale Parameter Inference in Dynamical Systems Analysis of the effects of smoking using mass-spec. data Plan for next week Gaussian Mixture (EM) Variational Inference Belief Propagation 32

Questions? 32