Probabilistic Machine Learning

Similar documents
Bayesian Inference and MCMC

Probabilistic Graphical Networks: Definitions and Basic Results

Sampling Methods (11/30/04)

Bayesian Networks BY: MOHAMAD ALSABBAGH

Tutorial on Probabilistic Programming with PyMC3

Introduction to Machine Learning

Undirected Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

an introduction to bayesian inference

An Introduction to Bayesian Machine Learning

STA 4273H: Statistical Machine Learning

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Density Estimation. Seungjin Choi

Probabilistic Graphical Models

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

A brief introduction to Conditional Random Fields

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017

Chapter 16. Structured Probabilistic Models for Deep Learning

Bayesian Methods: Naïve Bayes

Probability and Estimation. Alan Moses

Lecture 8: Bayesian Networks

Review: Bayesian learning and inference

Graphical Models and Kernel Methods

Bayesian Phylogenetics:

Introduction to Bayesian Learning

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Bayesian Computation

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Learning the hyper-parameters. Luca Martino

Naïve Bayes classification

Markov Chain Monte Carlo methods

Probabilistic Graphical Models

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

The Origin of Deep Learning. Lili Mou Jan, 2015

An Introduction to Bayesian Networks: Representation and Approximate Inference

Introduction to Restricted Boltzmann Machines

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Topics in Natural Language Processing

Markov Chains and MCMC

Probabilistic Graphical Models

Confidence Intervals. CAS Antitrust Notice. Bayesian Computation. General differences between Bayesian and Frequntist statistics 10/16/2014

Lecture 13 : Variational Inference: Mean Field Approximation

Monte Carlo Methods. Leon Gu CSD, CMU

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Lecture 2: From Linear Regression to Kalman Filter and Beyond

CPSC 540: Machine Learning

Markov Networks.

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Bayesian Networks in Educational Assessment

LECTURE 15 Markov chain Monte Carlo

Linear Dynamical Systems (Kalman filter)

Bayes Nets: Sampling

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

The connection of dropout and Bayesian statistics

Monte Carlo Methods. Geoff Gordon February 9, 2006

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Bayesian Machine Learning

Directed and Undirected Graphical Models

Approximate Inference

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Directed Graphical Models or Bayesian Networks

Lecture 6: Graphical Models

Bayesian Approaches Data Mining Selected Technique

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Machine Learning for Data Science (CS4786) Lecture 24

COMP90051 Statistical Machine Learning

Approximate Inference using MCMC

Intelligent Systems (AI-2)

Probabilistic Graphical Models

Bayesian RL Seminar. Chris Mansley September 9, 2008

Robust Monte Carlo Methods for Sequential Planning and Decision Making

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Probabilistic Graphical Models (I)

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Answers and expectations

Bayesian Methods for Machine Learning

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Intelligent Systems (AI-2)

CS 343: Artificial Intelligence

An overview of Bayesian analysis

Chris Bishop s PRML Ch. 8: Graphical Models

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Reminder of some Markov Chain properties:

Regularized Regression A Bayesian point of view

An introduction to Sequential Monte Carlo

Learning Bayesian network : Given structure and completely observed data

Conditional Independence and Factorization

Bayesian Graphical Models

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Transcription:

Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10.

Conditional Independence Independent random variables P[X, Y ] = P[X]P[Y ] Convenient, but not true often enough

Conditional Independence Independent random variables P[X, Y ] = P[X]P[Y ] Convenient, but not true often enough Conditional independence X Y Z P[X, Y Z] = P[X Z]P[Y Z] Use conditional independence in machine learning

Dependent but Conditionally Independent Events with a possibly biased coin: 1. X: Your first coin flip is heads 2. Y : Your second flip is heads 3. Z: Coin is biased

Dependent but Conditionally Independent Events with a possibly biased coin: 1. X: Your first coin flip is heads 2. Y : Your second flip is heads 3. Z: Coin is biased X and Y are not independent X and Y are independent given Z

Independent but Conditionally Dependent Is this possible?

Independent but Conditionally Dependent Is this possible? Yes! Events with an unbiased coin: 1. X: Your first coin flip is heads 2. Y : Your second flip is heads 3. Z: The coin flips are the same

Independent but Conditionally Dependent Is this possible? Yes! Events with an unbiased coin: 1. X: Your first coin flip is heads 2. Y : Your second flip is heads 3. Z: The coin flips are the same X and Y are independent X and Y are not independent given Z

Conditional Independence in Machine Learning Linear regression

Conditional Independence in Machine Learning Linear regression LDA

Conditional Independence in Machine Learning Linear regression LDA Naive Bayes

Directed Graphical Models Represent complex structure of conditional independence

Directed Graphical Models Represent complex structure of conditional independence Node is independent of all predecessors conditional on parent value x s x pred(s)\pa(s) x pa(s)

Undirected Graphical Models Another (different) representation of conditional independence Markov Random Fields

Naive Bayes Model Closely related to QDA and LDA

Naive Bayes Model Chain rule P[x 1, x 2, x 3 ] = P[x 1 ]P[x 2 x 1 ]P[x 3 x 1, x 2 ] Probability D P[x, y] = P[y] P[x j y] j=1

Why Bother with Conditional Independence?

Why Bother with Conditional Independence? Reduces number of parameters

Why Bother with Conditional Independence? Reduces number of parameters Reduces bias or variance?

Markov Chain 1st order Markov chain: 2nd order Markov chain:

Uses of Markov Chains Time series prediction Simulation of stochastic systems Inference in Bayesian nets and models Many others...

Hidden Markov Models Used for: Speech and language recognition Time series prediction Kalman filter: version with normal distributions used in GPS s

Inference Inference of hidden variables (y) P[y x v, θ] = P[y, x v θ] P[x v θ] Eliminating nuisance variables (e.g. x 1 is not observed) P[y x 2, θ] = x 1 P[y, x 1 x 2, θ] What is inference in linear regression?

Learning Computing conditional probabilities θ Approaches: 1. Maximum A Posteriori (MAP) arg max log P[θ x] = arg max (log P[x θ] + log P[θ]) θ θ

Learning Computing conditional probabilities θ Approaches: 1. Maximum A Posteriori (MAP) arg max log P[θ x] = arg max (log P[x θ] + log P[θ]) θ θ 2. Inference! Infer distribution of θ given x Return mode, median, mean, or anything appropriate

Learning Computing conditional probabilities θ Approaches: 1. Maximum A Posteriori (MAP) arg max log P[θ x] = arg max (log P[x θ] + log P[θ]) θ θ 2. Inference! Infer distribution of θ given x Return mode, median, mean, or anything appropriate Fixed effects vs random effects (mixed effects models)

Inference in Practice Precise inference is often impossible Variational inference: approximate models Markov Chain Monte Carlo (MCMC): Gibbs samples Metropolis Hastings Others

Probabilistic Modeling Languages Simple framework to describe a Bayesian model Inference with MCMC and parameter search Popular frameworks: JAGS BUGS, WinBUGS, OpenBUGS Stan Examples: Linear regression Ridge regression Lasso