Introduction into Bayesian statistics

Similar documents
The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Bayesian Analysis (Optional)

Bayesian Machine Learning

Density Estimation. Seungjin Choi

Linear Models A linear model is defined by the expression

Bayesian RL Seminar. Chris Mansley September 9, 2008

Computational Cognitive Science

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

David Giles Bayesian Econometrics

Bayesian Machine Learning

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Regression Linear and Logistic Regression

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

STAT J535: Introduction

Bayesian Machine Learning

Bayesian Inference and MCMC

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Bayesian Phylogenetics:

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Computational Cognitive Science

Lecture : Probabilistic Machine Learning

CS-E3210 Machine Learning: Basic Principles

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Introduction to Bayesian inference

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Bayesian Inference: Concept and Practice

Lecture 4 Bayes Theorem

Lecture 4 Bayes Theorem

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

g-priors for Linear Regression

CSC321 Lecture 18: Learning Probabilistic Models

Computational Perception. Bayesian Inference

Machine Learning 4771

Lecture 2: Priors and Conjugacy

Bayesian Inference. p(y)

COS513 LECTURE 8 STATISTICAL CONCEPTS

an introduction to bayesian inference

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Bayesian Regression (1/31/13)

Probabilistic Reasoning in Deep Learning

Ridge Regression. Mohammad Emtiyaz Khan EPFL Oct 1, 2015

Part 1: Expectation Propagation

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Advanced Probabilistic Modeling in R Day 1

Lecture 6: Model Checking and Selection

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Introduction to Bayesian Learning. Machine Learning Fall 2018

Lecture 1: Bayesian Framework Basics

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Lecture 13 Fundamentals of Bayesian Inference

Frequentist Statistics and Hypothesis Testing Spring

Kernel methods, kernel SVM and ridge regression

Nonparameteric Regression:

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Behavioral Data Mining. Lecture 2

Bayesian Inference. Introduction

Introduction to Probabilistic Machine Learning

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Naïve Bayes classification

Introduction to Bayesian Statistics

Lecture 2: From Linear Regression to Kalman Filter and Beyond

CS540 Machine learning L9 Bayesian statistics

Time Series and Dynamic Models

Probabilistic Graphical Models Lecture 20: Gaussian Processes

A Frequentist Assessment of Bayesian Inclusion Probabilities

Confidence Distribution

Nested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland

GWAS IV: Bayesian linear (variance component) models

The Laplace Approximation

An Introduction to Statistical and Probabilistic Linear Models

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Bayesian network modeling. 1

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Parametric Techniques Lecture 3

Outline lecture 2 2(30)

David Giles Bayesian Econometrics

GAUSSIAN PROCESS REGRESSION

Graphical Models for Collaborative Filtering

Data Analysis and Uncertainty Part 2: Estimation

Statistical learning. Chapter 20, Sections 1 3 1

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Parametric Techniques

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Bayesian Models in Machine Learning

Introduction to Machine Learning

A Probabilistic Framework for solving Inverse Problems. Lambros S. Katafygiotis, Ph.D.

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Learning in Bayesian Networks

Introduction to Bayesian Inference

Bayesian Linear Regression [DRAFT - In Progress]

Probability and Estimation. Alan Moses

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Transcription:

Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7

Content 1 Framework Notations 2 Difference Bayesians vs Frequentists 3 Knowledge transfer Maxim Kochurov Introduction into Bayesian statistics EF MSU 2 / 7

Framework Framework Treat everything as random variables Distribution over a variable is our ignorance about it Use bayes theorem Maxim Kochurov Introduction into Bayesian statistics EF MSU 3 / 7

Framework Notations Notations Bayes theorem p(a B) = p(b, A) p(b) p(θ D) = p(d θ)p(θ) p(d) = p(b A)p(A) p(b) = p(d θ)p(θ) p(d θ)p(θ)dθ Likelihood = p(d θ) - Measure of how well our data is described by the given model Prior = p(θ) - Our prior beliefs about θ before seeing the data Posterior = p(θ D) - Our beliefs about θ after seeing the data Evidence = p(d) = p(d θ)p(θ)dθ - Measure of how common is our data Maxim Kochurov Introduction into Bayesian statistics EF MSU 4 / 7

Framework Bayesian model is specified by the given data D, likelihood function p(d θ) - probability to see data under parameters θ, and prior beliefs p(θ) about parameters Maxim Kochurov Introduction into Bayesian statistics EF MSU 5 / 7

Framework Bayesian model is specified by the given data D, likelihood function p(d θ) - probability to see data under parameters θ, and prior beliefs p(θ) about parameters Example Linear Regression D = (X, y) = {(x i, y i ) i [1,, N]} Maxim Kochurov Introduction into Bayesian statistics EF MSU 5 / 7

Framework Bayesian model is specified by the given data D, likelihood function p(d θ) - probability to see data under parameters θ, and prior beliefs p(θ) about parameters Example Linear Regression D = (X, y) = {(x i, y i ) i [1,, N]} p(d θ) = p(e β); e = y X β; e N (0, σ 2 I ) Maxim Kochurov Introduction into Bayesian statistics EF MSU 5 / 7

Framework Bayesian model is specified by the given data D, likelihood function p(d θ) - probability to see data under parameters θ, and prior beliefs p(θ) about parameters Example Linear Regression D = (X, y) = {(x i, y i ) i [1,, N]} p(d θ) = p(e β); e = y X β; e N (0, σ 2 I ) p(θ) = p(β); β N (0, σ 2 β I ) Maxim Kochurov Introduction into Bayesian statistics EF MSU 5 / 7

Framework Bayesian model is specified by the given data D, likelihood function p(d θ) - probability to see data under parameters θ, and prior beliefs p(θ) about parameters Example Linear Regression D = (X, y) = {(x i, y i ) i [1,, N]} p(d θ) = p(e β); e = y X β; e N (0, σ 2 I ) p(θ) = p(β); β N (0, σ 2 β I ) p(β D) = p(d β)p(β) p(d) Maxim Kochurov Introduction into Bayesian statistics EF MSU 5 / 7

Framework Bayesian model is specified by the given data D, likelihood function p(d θ) - probability to see data under parameters θ, and prior beliefs p(θ) about parameters Example Linear Regression D = (X, y) = {(x i, y i ) i [1,, N]} p(d θ) = p(e β); e = y X β; e N (0, σ 2 I ) p(θ) = p(β); β N (0, σ 2 β I ) p(β D) = p(d β)p(β) p(d) Maximizing p(β D) by β gives a MAP solution In this case it is called a ridge regression solution Maxim Kochurov Introduction into Bayesian statistics EF MSU 5 / 7

Difference Bayesians vs Frequentists Bayesian and Frequentist approach differences Frequentist Bayesian Randomness Objective indefiniteness Subjective ignorance Variables Random and Deterministic Everything is random Inference ML Estimation Posterior or MAP Estimation Applicability n 1 n Maxim Kochurov Introduction into Bayesian statistics EF MSU 6 / 7

Knowledge transfer A story Maxim Kochurov Introduction into Bayesian statistics EF MSU 7 / 7

Knowledge transfer A story Consider you have to estimate θ in some problem1 All you have is your prior domain knowledge p(θ) and data D Maxim Kochurov Introduction into Bayesian statistics EF MSU 7 / 7

Knowledge transfer A story Consider you have to estimate θ in some problem1 All you have is your prior domain knowledge p(θ) and data D Then you solve the problem and get posterior p(β D 1 ), what s next? Maxim Kochurov Introduction into Bayesian statistics EF MSU 7 / 7

Knowledge transfer A story Consider you have to estimate θ in some problem1 All you have is your prior domain knowledge p(θ) and data D Then you solve the problem and get posterior p(β D 1 ), what s next? Use p(β D 1 ) in the future for problem2! Just treat posterior p(β D 1 ) as a new prior Maxim Kochurov Introduction into Bayesian statistics EF MSU 7 / 7

Knowledge transfer A story Consider you have to estimate θ in some problem1 All you have is your prior domain knowledge p(θ) and data D Then you solve the problem and get posterior p(β D 1 ), what s next? Use p(β D 1 ) in the future for problem2! Just treat posterior p(β D 1 ) as a new prior p(β D 2 ) = p(d 2 β)p(β D 1 ) p(d 2 ) Maxim Kochurov Introduction into Bayesian statistics EF MSU 7 / 7