Introduction into Bayesian statistics

Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7

Content 1 Framework Notations 2 Difference Bayesians vs Frequentists 3 Knowledge transfer Maxim Kochurov Introduction into Bayesian statistics EF MSU 2 / 7

Framework Framework Treat everything as random variables Distribution over a variable is our ignorance about it Use bayes theorem Maxim Kochurov Introduction into Bayesian statistics EF MSU 3 / 7

Framework Notations Notations Bayes theorem p(a B) = p(b, A) p(b) p(θ D) = p(d θ)p(θ) p(d) = p(b A)p(A) p(b) = p(d θ)p(θ) p(d θ)p(θ)dθ Likelihood = p(d θ) - Measure of how well our data is described by the given model Prior = p(θ) - Our prior beliefs about θ before seeing the data Posterior = p(θ D) - Our beliefs about θ after seeing the data Evidence = p(d) = p(d θ)p(θ)dθ - Measure of how common is our data Maxim Kochurov Introduction into Bayesian statistics EF MSU 4 / 7

Framework Bayesian model is specified by the given data D, likelihood function p(d θ) - probability to see data under parameters θ, and prior beliefs p(θ) about parameters Maxim Kochurov Introduction into Bayesian statistics EF MSU 5 / 7

Framework Bayesian model is specified by the given data D, likelihood function p(d θ) - probability to see data under parameters θ, and prior beliefs p(θ) about parameters Example Linear Regression D = (X, y) = {(x i, y i ) i [1,, N]} p(d θ) = p(e β); e = y X β; e N (0, σ 2 I ) p(θ) = p(β); β N (0, σ 2 β I ) Maxim Kochurov Introduction into Bayesian statistics EF MSU 5 / 7

Framework Bayesian model is specified by the given data D, likelihood function p(d θ) - probability to see data under parameters θ, and prior beliefs p(θ) about parameters Example Linear Regression D = (X, y) = {(x i, y i ) i [1,, N]} p(d θ) = p(e β); e = y X β; e N (0, σ 2 I ) p(θ) = p(β); β N (0, σ 2 β I ) p(β D) = p(d β)p(β) p(d) Maxim Kochurov Introduction into Bayesian statistics EF MSU 5 / 7

Framework Bayesian model is specified by the given data D, likelihood function p(d θ) - probability to see data under parameters θ, and prior beliefs p(θ) about parameters Example Linear Regression D = (X, y) = {(x i, y i ) i [1,, N]} p(d θ) = p(e β); e = y X β; e N (0, σ 2 I ) p(θ) = p(β); β N (0, σ 2 β I ) p(β D) = p(d β)p(β) p(d) Maximizing p(β D) by β gives a MAP solution In this case it is called a ridge regression solution Maxim Kochurov Introduction into Bayesian statistics EF MSU 5 / 7

Difference Bayesians vs Frequentists Bayesian and Frequentist approach differences Frequentist Bayesian Randomness Objective indefiniteness Subjective ignorance Variables Random and Deterministic Everything is random Inference ML Estimation Posterior or MAP Estimation Applicability n 1 n Maxim Kochurov Introduction into Bayesian statistics EF MSU 6 / 7

Knowledge transfer A story Maxim Kochurov Introduction into Bayesian statistics EF MSU 7 / 7

Knowledge transfer A story Consider you have to estimate θ in some problem1 All you have is your prior domain knowledge p(θ) and data D Maxim Kochurov Introduction into Bayesian statistics EF MSU 7 / 7

Knowledge transfer A story Consider you have to estimate θ in some problem1 All you have is your prior domain knowledge p(θ) and data D Then you solve the problem and get posterior p(β D 1 ), what s next? Use p(β D 1 ) in the future for problem2! Just treat posterior p(β D 1 ) as a new prior Maxim Kochurov Introduction into Bayesian statistics EF MSU 7 / 7

Knowledge transfer A story Consider you have to estimate θ in some problem1 All you have is your prior domain knowledge p(θ) and data D Then you solve the problem and get posterior p(β D 1 ), what s next? Use p(β D 1 ) in the future for problem2! Just treat posterior p(β D 1 ) as a new prior p(β D 2 ) = p(d 2 β)p(β D 1 ) p(d 2 ) Maxim Kochurov Introduction into Bayesian statistics EF MSU 7 / 7