Probabilistic Graphical Models

Similar documents
Probabilistic Graphical Models

Lecture : Probabilistic Machine Learning

Machine Learning

Bayesian Machine Learning

Probabilistic numerics for deep learning

Lecture 9: PGM Learning

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Cheng Soon Ong & Christian Walder. Canberra February June 2018

The Naïve Bayes Classifier. Machine Learning Fall 2017

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule

Introduction to Bayesian Learning. Machine Learning Fall 2018

Quantifying uncertainty & Bayesian networks

CS 188: Artificial Intelligence Spring Today

Bayesian Networks BY: MOHAMAD ALSABBAGH

Gaussian Processes for Machine Learning

Overfitting, Bias / Variance Analysis

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Introduction to Machine Learning

Based on slides by Richard Zemel

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CPSC 540: Machine Learning

CSC321 Lecture 18: Learning Probabilistic Models

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

PMR Learning as Inference

Bayesian Learning Extension

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

DD Advanced Machine Learning

CS-E3210 Machine Learning: Basic Principles

CPSC 540: Machine Learning

Machine Learning Srihari. Probability Theory. Sargur N. Srihari

PATTERN RECOGNITION AND MACHINE LEARNING

Introduction to Artificial Intelligence. Unit # 11

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

CS 188: Artificial Intelligence Spring Announcements

Bayesian Support Vector Machines for Feature Ranking and Selection

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

CPSC 340: Machine Learning and Data Mining

Bayes Nets III: Inference

Uncertainty and knowledge. Uncertainty and knowledge. Reasoning with uncertainty. Notes

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

PROBABILITY AND INFERENCE

Probabilistic Graphical Models for Image Analysis - Lecture 1

COURSE INTRODUCTION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Gaussian Process Regression

Probabilistic Graphical Models

GAUSSIAN PROCESS REGRESSION

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Classification: The rest of the story

Parameter estimation Conditional risk

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Machine Learning

COMP90051 Statistical Machine Learning

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Probabilistic Classification

Probabilistic Models

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

Introduction to Machine Learning

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Lecture 1: Probability Fundamentals

Introduction to Machine Learning

Bayesian Approaches Data Mining Selected Technique

STA 4273H: Statistical Machine Learning

Our Status. We re done with Part I Search and Planning!

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Our Status in CSE 5522

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

PROBABILITY. Inference: constraint propagation

20: Gaussian Processes

Probability and Estimation. Alan Moses

CS 188: Artificial Intelligence. Our Status in CS188

ECE521 week 3: 23/26 January 2017

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Computational Cognitive Science

CSE 473: Artificial Intelligence

GWAS IV: Bayesian linear (variance component) models

Probabilistic representation and reasoning

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Covariance. if X, Y are independent

Probability Review. September 25, 2015

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Naïve Bayes classification

BAYESIAN DECISION THEORY

Linear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0.

Bayesian Learning in Undirected Graphical Models

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Probability Theory for Machine Learning. Chris Cremer September 2015

Advanced Probabilistic Modeling in R Day 1

Transcription:

Probabilistic Graphical Models Introduction. Basic Probability and Bayes Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 26/9/2011 (EPFL) Graphical Models 26/9/2011 1 / 28

Outline 1 Motivation 2 Probability. Decisions. Estimation 3 Bayesian Terminology (EPFL) Graphical Models 26/9/2011 2 / 28

Benefits of Doubt Motivation Not to be absolutely certain is, I think, one of the essential things in rationality B. Russell (1947) Real-world problems are uncertain Measurement errors Incomplete, ambiguous data Model? Features? (EPFL) Graphical Models 26/9/2011 3 / 28

Benefits of Doubt Motivation Not to be absolutely certain is, I think, one of the essential things in rationality B. Russell (1947) If uncertainty is part of your problem... Ignore/remove it Costly Complicated Not always possible (EPFL) Graphical Models 26/9/2011 3 / 28

Benefits of Doubt Motivation Not to be absolutely certain is, I think, one of the essential things in rationality B. Russell (1947) If uncertainty is part of your problem... Ignore/remove it Costly Complicated Not always possible Live with it Quantify it: probabilities Compute it: Bayesian inference (EPFL) Graphical Models 26/9/2011 3 / 28

Benefits of Doubt Motivation Not to be absolutely certain is, I think, one of the essential things in rationality B. Russell (1947) If uncertainty is part of your problem... Ignore/remove it Costly Complicated Not always possible Exploit it Experimental design Robust decision making Multimodal data integration (EPFL) Graphical Models 26/9/2011 3 / 28

Motivation Image Reconstruction (EPFL) Graphical Models 26/9/2011 4 / 28

Motivation Reconstruction is Ill Posed (EPFL) Graphical Models 26/9/2011 5 / 28

Image Statistics Motivation Whatever images are... they are not Gaussian! Image gradient super-gaussian ( sparse ) Use sparsity prior distribution P(u) (EPFL) Graphical Models 26/9/2011 6 / 28

Motivation Posterior Distribution Likelihood P(y u): Data fit P(y u) (EPFL) Graphical Models 26/9/2011 7 / 28

Motivation Posterior Distribution Likelihood P(y u): Data fit Prior P(u): Signal properties P(y u) P(u) (EPFL) Graphical Models 26/9/2011 7 / 28

Motivation Posterior Distribution Likelihood P(y u): Data fit Prior P(u): Signal properties Posterior distribution P(u y ): Consistent information summary P(u y )= P(y u) P(u) P(y ) (EPFL) Graphical Models 26/9/2011 7 / 28

Estimation Motivation Maximum a Posteriori (MAP) Estimation u = argmax u P(y u)p(u) (EPFL) Graphical Models 26/9/2011 8 / 28

Estimation Motivation Maximum a Posteriori (MAP) Estimation There are many solutions. Why settle for any single one? (EPFL) Graphical Models 26/9/2011 8 / 28

Bayesian Inference Motivation Use All Solutions Weight each solution by our uncertainty Average over them. Integrate, don t prune (EPFL) Graphical Models 26/9/2011 9 / 28

Motivation Bayesian Experimental Design Posterior: Uncertainty in reconstruction Experimental design: Find poorly determined directions Sequential search with interjacent partial measurements (EPFL) Graphical Models 26/9/2011 10 / 28

Structure of Course Motivation Graphical Models [ 6 weeks] Probabilistic database. Expert system Query for making optimal decisions Graph separation $ conditional independence ) (More) efficient computation (dynamic programming) Approximate Inference [ 6 weeks] Bayesian inference is never really tractable Variational relaxations (convex duality) Propagation algorithms Sparse Bayesian models (EPFL) Graphical Models 26/9/2011 11 / 28

Course Goals Motivation This course is not: Exhaustive (but we will give pointers) Playing with data until it works Purely theoretical analysis of methods This course is: Computer scientist s view on Bayesian machine learning: Layers above and below formulae Understand concepts (what to do and why) Understand approximations, relaxations, generic algorithmic schemata (how to do, above formulae) Safe implementation on a computer (how to do, below formulae) Red line through models, algorithms. Exposing roots in specialized computational mathematics (EPFL) Graphical Models 26/9/2011 12 / 28

Probability. Decisions. Estimation Why Probability? Remember sleeping through Statistics 101 (p-value, t-test,...)? Forget that impression! Probability leads to beautiful, useful insights and algorithms. Not much would work today without probabilistic algorithms, decisions from incomplete knowledge. Numbers, functions, moving bodies ) Calculus Predicates, true/false statements ) Predicate logic Uncertain knowledge about numbers, predicates,... ) Probability Machine learning? Have to speak probability! Crash course here. But dig further, it s worth it: Grimmett, Stirzaker: Probability and Random Processes Pearl: Probabilistic Reasoning in Intelligent Systems (EPFL) Graphical Models 26/9/2011 13 / 28

Probability. Decisions. Estimation Why Probability? Reasons to use probability (forget classical straightjacket) We really don t / cannot know (exactly) It would be too complicated/costly to find out It would take too long to compute Nondeterministic processes (given measurement resolution) Subjective beliefs, interpretations (EPFL) Graphical Models 26/9/2011 14 / 28

Probability. Decisions. Estimation Probability over Finite/Countable Sets You know databases? You know probability! Probability distribution P: Joint table/hypercube ( 0; P = 1) Random variable F: Index of table Event E: Part of table Probability P(E): Sum over cells in E Marginal distribution P(F): Projection of table (sum over others) Box Fruit red blue apple 1/10 9/20 orange 3/10 3/20 (EPFL) Graphical Models 26/9/2011 15 / 28

Probability. Decisions. Estimation Probability over Finite/Countable Sets You know databases? You know probability! Probability distribution P: Joint table/hypercube ( 0; P = 1) Random variable F: Index of table Event E: Part of table Probability P(E): Sum over cells in E Marginal distribution P(F): Projection of table (sum over others) Marginalization (Sum Rule) Not interested in variable(s) right now? ) Marginalize over them! Box Fruit red blue apple 1/10 9/20 orange 3/10 3/20 P(F) = X B=r,b P(F, B) (EPFL) Graphical Models 26/9/2011 15 / 28

Probability. Decisions. Estimation Probability over Finite/Countable Sets (II) Conditional probability: Factorization of table Chop out part you re sure about (don t marginalize: you know!) Renormalize to 1 (/ marginal) Box Fruit red blue apple 1/10 9/20 orange 3/10 3/20 (EPFL) Graphical Models 26/9/2011 16 / 28

Probability. Decisions. Estimation Probability over Finite/Countable Sets (II) Conditional probability: Factorization of table Chop out part you re sure about (don t marginalize: you know!) Renormalize to 1 (/ marginal) Conditioning (Product Rule) Observed some variable/event? ) Condition on it! Joint = Conditional Marginal Box Fruit red blue apple 1/10 9/20 orange 3/10 3/20 P(F, B) =P(F B)P(B) (EPFL) Graphical Models 26/9/2011 16 / 28

Probability. Decisions. Estimation Probability over Finite/Countable Sets (II) Conditional probability: Factorization of table Chop out part you re sure about (don t marginalize: you know!) Renormalize to 1 (/ marginal) Conditioning (Product Rule) Observed some variable/event? ) Condition on it! Joint = Conditional Marginal Information propagation (B!F) Predict Marginalize Box Fruit red blue apple 1/10 9/20 orange 3/10 3/20 P(F, B) =P(F B)P(B) P(F) = X B P(F B)P(B) (EPFL) Graphical Models 26/9/2011 16 / 28

Probability. Decisions. Estimation Probability over Finite/Countable Sets (III) Bayes Formula P(B F)P(F) =P(F, B) =P(F B)P(B) Box Fruit red blue apple 1/10 9/20 orange 3/10 3/20 (EPFL) Graphical Models 26/9/2011 17 / 28

Probability. Decisions. Estimation Probability over Finite/Countable Sets (III) Bayes Formula P(B F) = P(F B)P(B) P(F) Inversion of information flow Causal! diagnostic (diseases! symptoms) Inverse problem Box Fruit red blue apple 1/10 9/20 orange 3/10 3/20 (EPFL) Graphical Models 26/9/2011 17 / 28

Probability. Decisions. Estimation Probability over Finite/Countable Sets (III) Bayes Formula P(B F) = P(F B)P(B) P(F) Inversion of information flow Causal! diagnostic (diseases! symptoms) Inverse problem Chain rule of probability Box Fruit red blue apple 1/10 9/20 orange 3/10 3/20 P(X 1,...,X n )=P(X 1 )P(X 2 X 1 )P(X 3 X 1, X 2 )...P(X n X 1,...,X n 1 ) Holds in any ordering Starting point for Bayesian networks [next lecture] (EPFL) Graphical Models 26/9/2011 17 / 28

Probability. Decisions. Estimation Probability over Continuous Variables (EPFL) Graphical Models 26/9/2011 18 / 28

Probability. Decisions. Estimation Probability over Continuous Variables (II) Caveat: Null sets [P({x = 5}) =0; P({x 2 N}) =0] Every observed event is a null set! P(y x) =P(y, x)/p(x) cannot work for P(x) =0 Define conditional density as P(y x) s.t. Z P(y x 2A)= P(y x)p(x) dx Most cases in practice: Look at y 7! P(y, x) ( plug in x ) Recognize density / normalize A for all events A (EPFL) Graphical Models 26/9/2011 19 / 28

Probability. Decisions. Estimation Probability over Continuous Variables (II) Caveat: Null sets [P({x = 5}) =0; P({x 2 N}) =0] Every observed event is a null set! P(y x) =P(y, x)/p(x) cannot work for P(x) =0 Define conditional density as P(y x) s.t. Z P(y x 2A)= P(y x)p(x) dx Most cases in practice: Look at y 7! P(y, x) ( plug in x ) Recognize density / normalize A for all events A Another (technical) caveat: Not all subsets can be events. ) Events: Nice subsets (measurable) (EPFL) Graphical Models 26/9/2011 19 / 28

Probability. Decisions. Estimation Expectation. Moments of a Distribution Expectation Z E[f (x )] = f (x )P(x ) dx or X f (x )P(x ) x (EPFL) Graphical Models 26/9/2011 20 / 28

Probability. Decisions. Estimation Expectation. Moments of a Distribution Expectation Z E[f (x )] = f (x )P(x ) dx or X f (x )P(x ) x P(x ) complicated. How does x P(x ) behave? Moments: Essential behaviour of distribution Mean (1st order) Z E[x ]= x P(x ) dx Covariance (2nd order) Cov[x, y ]=E[xy T ] E[x ](E[y ]) T = E[v x v T y ], v x = x E[x ] (EPFL) Graphical Models 26/9/2011 20 / 28

Probability. Decisions. Estimation Expectation. Moments of a Distribution Expectation Z E[f (x )] = f (x )P(x ) dx or X f (x )P(x ) x P(x ) complicated. How does x P(x ) behave? Moments: Essential behaviour of distribution Mean (1st order) Z E[x ]= x P(x ) dx Covariance (2nd order) Cov[x ]=Cov[x, x ] (EPFL) Graphical Models 26/9/2011 20 / 28

Probability. Decisions. Estimation Decision Theory in 30 Seconds Recipe for optimal decisions 1 Choose a loss function L (depends on problem, your valuation, susceptibility) 2 Model actions (A), outcomes (O). Compute P(O A) 3 Compute risks (expected losses) R(A) = R L(O)P(O A) do 4 Go for A = argmin A R(A) Special case: Pricing of A (option, bet, car) Choose R(A)+Margin Next best measurement? Next best scientific experiment? Harder if timing plays a role (optimal control, etc) (EPFL) Graphical Models 26/9/2011 21 / 28

Probability. Decisions. Estimation Maximum Likelihood Estimation Bayesian inversion hard. In simple cases, with enough data: Maximum Likelihood Estimation Observed {x i }. Interested in cause Construct sampling model P(x ) Likelihood L( ) =P(D ) = Q i P(x i ): Should be high close to true 0 Maximum likelihood estimator: = argmax L( ) =argmax log L( ) =argmax P i log P(x i ) (EPFL) Graphical Models 26/9/2011 22 / 28

Probability. Decisions. Estimation Maximum Likelihood Estimation Bayesian inversion hard. In simple cases, with enough data: Maximum Likelihood Estimation Observed {x i }. Interested in cause Construct sampling model P(x ) Likelihood L( ) =P(D ) = Q i P(x i ): Should be high close to true 0 Maximum likelihood estimator: = argmax L( ) =argmax log L( ) =argmax P i log P(x i ) Method of choice for simple, lots of data. Well understood asymptotically Knowledge about besides D? Not used Breaks down if larger than D And another problem... (EPFL) Graphical Models 26/9/2011 22 / 28

Overfitting Probability. Decisions. Estimation Overfitting Problem (Estimation) For finite D, more and more complicated models fit D better and better. With huge brain, you just learn by heart. Generalization only comes with a limit on complexity! Marginalization solves this problem, but even Bayesian estimation ( half-way marginalization ) embodies complexity control. 3 4 4 2.5 3.5 3 3 2 2 2.5 1 1.5 2 0 1 1.5 1 1 2 0.5 0.5 3 0 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.4 0.6 0.8 1 4 0 0.2 0.4 0.6 0.8 1 (EPFL) Graphical Models 26/9/2011 23 / 28

Bayesian Terminology Probabilistic Model Model Concise description of joint distribution (generative process) of all variables of interest Encoding assumptions: What are the entities? How do they relate? Variables have different roles. Roles may change depending on what model is used for Model specifies variables and their (in)dependencies ) Graphical models [next lecture] (EPFL) Graphical Models 26/9/2011 24 / 28

Bayesian Terminology The Linear Model (Polynomial Fitting) Fit data with polynomial (degree k) Prior P(w ) Restrictions on w Prior knowledge ) Prefer smaller kw k Likelihood P(y w ) Posterior P(w y )= P(y w )P(w ) P(y ) Linear Model y = Xw + " y Responses (observed) X Design (controlled) w Weights (query) " Noise (nuisance) (EPFL) Graphical Models 26/9/2011 25 / 28

Bayesian Terminology The Linear Model (Polynomial Fitting) Fit data with polynomial (degree k) Prior P(w ) Restrictions on w Prior knowledge ) Prefer smaller kw k Likelihood P(y w ) Posterior P(w y )= P(y w )P(w ) P(y ) Prediction: y = x T E[w y ] Marginal likelihood Z P(y )=P(y k) = Linear Model y = Xw + " y Responses (observed) X Design (controlled) w Weights (query) " Noise (nuisance) P(y w )P(w ) dw (EPFL) Graphical Models 26/9/2011 25 / 28

Bayesian Terminology The Linear Model (Polynomial Fitting) 4 2 0 2 4 6 8 2 1.5 1 0.5 0 0.5 1 1.5 2 (EPFL) Graphical Models 26/9/2011 26 / 28

Bayesian Terminology The Linear Model (Polynomial Fitting) 4 2 0 2 4 6 8 2 1.5 1 0.5 0 0.5 1 1.5 2 (EPFL) Graphical Models 26/9/2011 26 / 28

Bayesian Terminology Model Selection 4 Posterior P(w y, k = 3) 8 x 10 13 Marginal Likelihood P(y k) 2 0 2 4 4 6 8 2 1.5 1 0.5 0 0.5 1 1.5 2 0 1 2 3 4 Simpler hypotheses considered as well ) Occam s razor (EPFL) Graphical Models 26/9/2011 27 / 28

Bayesian Terminology Model Averaging Don t know polynomial order k ) Marginalize out E[y y ]= X k 1 P(k y )x (k) T E[w y, k] 4 2 0 8 x 10 13 2 4 4 6 0 1 2 3 4 8 2 1.5 1 0.5 0 0.5 1 1.5 2 (EPFL) Graphical Models 26/9/2011 28 / 28