Lecture 18: Bayesian Inference

Size: px
Start display at page:

Download "Lecture 18: Bayesian Inference"

Transcription

1 Lecture 18: Bayesian Inference Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 18 Probability and Statistics, Spring / 10

2 Bayesian Statistical Inference Statiscal inference Process of estimating information about an unknown variable/model For example, biased coin with a head coming up w.p. p(unknown) Bayesian vs. Classical Inference Bayesian Unknowns Random variables with known distributions Prior distribution p Θ (θ) Posterior p Θ X (θ x) (x: observed data) Classical Deterministic quantities that happen to be unknown θ: constant Estimate θ with some performance guarantee Lecture 18 Probability and Statistics, Spring / 10

3 Bayesian Statistical Inference (contd.) Inference(model/variable) problem i) Model inference : construct a process and predict the future(weather forecast) ii) Variable inference : estimate unknown parameter(gps reachings and current position) - Example : Noisy channel i) sequence of binary messages S i {0, 1} transmitted over a wireless channel ii) receiver observes X i = as i + W i, i = 1,, n W i N(0, ), a : scalar iii) model inference problem : a unknown(s i s known) iv) variable inference problem : infer S i s(a known) based on X i s Lecture 18 Probability and Statistics, Spring / 10

4 Bayesian Statistical Inference Types of statistical inference problems - Estimation(of unknown { constant or RV) binary - Hypothesis testing m-ary Bayesian inference methods i) Maximum a posterior probability (MAP) rule ii) Least mean squares (LMS) estimation iii) Linear least mean squares estimation Lecture 18 Probability and Statistics, Spring / 10

5 Bayesian Inference and Posterior Distribution Pictorial introduction Bayes rule (i) Θ discrete, X discrete : p Θ X (θ x) = (ii) Θ discrete, X continuous : p Θ X (θ x) = (iii) Θ continuous, X discrete : f Θ X (θ x) = (iv) Θ continuous, X continuous : f Θ X (θ x) = pθ(θ)p X Θ(x θ) θ pθ(θ )p X Θ (x θ ) pθ(θ)f X Θ(x θ) θ pθ(θ )f X Θ (x θ ) fθ(θ)p X Θ(x θ) fθ(θ )p X Θ (x θ )dθ fθ(θ)f X Θ(x θ) fθ(θ )f X Θ (x θ )dθ Lecture 18 Probability and Statistics, Spring / 10

6 Conditional Probability Revisited 4 versions of conditional probability (i) p Θ X (θ x) = p Θ,X (θ,x) p X (x) (ii) (iii) p Θ X (θ x) = P(Θ = θ X = x) = lim δ 0 P(Θ = θ x X x + δ) = lim δ 0 p Θ (θ)p(x X x + δ Θ = θ) P(x X x + δ) = p Θ (θ)f X Θ (x θ) θ p Θ(θ )f X Θ (x θ ) = p Θ(θ)f X Θ (x θ) f X (x) f Θ X (θ x) = lim δ 0 P(θ Θ θ + δ X = x) δ = lim δ 0 P(θ Θ θ + δ)p(x = x θ Θ θ + δ) δp(x = x) = f Θ(θ)p X Θ (x θ) p X (x) (iv) f Θ X (θ x) = f Θ(θ)f X Θ (x θ) f X (x) Lecture 18 Probability and Statistics, Spring / 10

7 Example Romeo & Juliet I - Juliet will be late on any date by a random amout X U(0, θ) - θ unknown modeled as RV Θ U(0, 1) - Assume that Juliet was late by an amout x on 1st date - How to update the distribution of Θ What we know { 1, if 0 θ 1 1) The prior PDF f Θ(θ) = 0, o.w. 2) Conditional PDF of observation f X Θ (x Θ) = Posterior PDF f Θ X (θ x) = f Θ (θ)f X Θ (x θ) 10 f Θ (θ )f X Θ (x θ )dθ = f Θ X (θ x) = 0 if θ < x or θ > 1 { 1, θ 0, o.w. if 0 x θ 1 θ 1x 1 θ dθ = 1 θ logx, if x θ 1 Lecture 18 Probability and Statistics, Spring / 10

8 Example ii) Romeo & Juliet II - Observe first n dates - Juliet is late by X 1, X 2,, X n U(0, θ) given Θ = θ - Let X = [X 1, X 2,, X n ], x = [x 1, x 2,, x n ] - Conditional PDF f X Θ (x θ) = f { X1 Θ(x 1 θ) f Xn Θ(x n θ) 1 = θ, if max{x n 1,, x n } θ 1 0, o.w. - Posterior PDF 1 f Θ(θ)f X Θ (x θ) 1 = θ n f Θ X (θ x) = 0 fθ(θ )f X Θ (x θ )dθ 1, if x θ 1 1 x (θ ) n dθ 0, o.w. Lecture 18 Probability and Statistics, Spring / 10

9 Example iii) Beta priors on the bias of a coin - Biased coin with probability of heads θ - θ unknown modeled as RV Θ with known prior PDF of Θ - Consider n independent tosses : X = # heads observed - Posterior PDF f Θ X (θ k) = f Θ(θ)p X Θ (k θ) 1 0 fθ(θ )p X Θ (k θ )dθ 1 c = cf Θ (θ)p X Θ (k θ) = c( n k )f Θ(θ)θ k (1 θ) n k - Suppose { 1 B(α,β) f Θ (θ) = θα 1 (1 θ) β 1, if 0 < θ < 1 0, o.w. B(α, β) = 1 0 θα 1 (1 θ) β 1 dθ = (α 1)!(β 1)! (α+β 1)! f Θ X (θ k) = d B(α,β) θk+α 1 (1 θ) n k+β 1, 0 θ 1 α > 0, β > 0 Lecture 18 Probability and Statistics, Spring / 10

10 Example iv) Spam filtering - An message is spam or legitimate - Θ = 1 if spam, Θ = 2 if legit p Θ (1) p Θ (2) - {w 1,, w n } : collection of special words whose appearance suggest a spam - For each i, X i =Bernoulli RV modeling appearance of w i X i = 1 if w i appears, 0 o.w. - Conditional probability p Xi Θ(x i 1), p Xi Θ(x i 2) known - X 1,, X n independent, given Θ - Posterior probability P(Θ = m X i = x i, i = 1,, n) = pθ(m) n i=1 p X i Θ(x i m) 2 j=1 pθ(j) n i=1 p X i Θ(x i j) Lecture 18 Probability and Statistics, Spring / 10

11 Lecture 18: Bayesian Inference Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 18 Probability and Statistics, Spring / 10

12 Bayesian Statistical Inference Statiscal inference Process of estimating information about an unknown variable/model For example, biased coin with a head coming up w.p. p(unknown) Bayesian vs. Classical Inference Bayesian Unknowns Random variables with known distributions Prior distribution p Θ (θ) Posterior p Θ X (θ x) (x: observed data) Classical Deterministic quantities that happen to be unknown θ: constant Estimate θ with some performance guarantee Lecture 18 Probability and Statistics, Spring / 10

13 Bayesian Statistical Inference (contd.) Inference(model/variable) problem i) Model inference : construct a process and predict the future(weather forecast) ii) Variable inference : estimate unknown parameter(gps reachings and current position) - Example : Noisy channel i) sequence of binary messages S i {0, 1} transmitted over a wireless channel ii) receiver observes X i = as i + W i, i = 1,, n W i N(0, ), a : scalar iii) model inference problem : a unknown(s i s known) iv) variable inference problem : infer S i s(a known) based on X i s Lecture 18 Probability and Statistics, Spring / 10

14 Bayesian Statistical Inference Types of statistical inference problems - Estimation(of unknown { constant or RV) binary - Hypothesis testing m-ary Bayesian inference methods i) Maximum a posterior probability (MAP) rule ii) Least mean squares (LMS) estimation iii) Linear least mean squares estimation Lecture 18 Probability and Statistics, Spring / 10

15 Bayesian Inference and Posterior Distribution Pictorial introduction Bayes rule (i) Θ discrete, X discrete : p Θ X (θ x) = (ii) Θ discrete, X continuous : p Θ X (θ x) = (iii) Θ continuous, X discrete : f Θ X (θ x) = (iv) Θ continuous, X continuous : f Θ X (θ x) = pθ(θ)p X Θ(x θ) θ pθ(θ )p X Θ (x θ ) pθ(θ)f X Θ(x θ) θ pθ(θ )f X Θ (x θ ) fθ(θ)p X Θ(x θ) fθ(θ )p X Θ (x θ )dθ fθ(θ)f X Θ(x θ) fθ(θ )f X Θ (x θ )dθ Lecture 18 Probability and Statistics, Spring / 10

16 Conditional Probability Revisited 4 versions of conditional probability (i) p Θ X (θ x) = p Θ,X (θ,x) p X (x) (ii) (iii) p Θ X (θ x) = P(Θ = θ X = x) = lim δ 0 P(Θ = θ x X x + δ) = lim δ 0 p Θ (θ)p(x X x + δ Θ = θ) P(x X x + δ) = p Θ (θ)f X Θ (x θ) θ p Θ(θ )f X Θ (x θ ) = p Θ(θ)f X Θ (x θ) f X (x) f Θ X (θ x) = lim δ 0 P(θ Θ θ + δ X = x) δ = lim δ 0 P(θ Θ θ + δ)p(x = x θ Θ θ + δ) δp(x = x) = f Θ(θ)p X Θ (x θ) p X (x) (iv) f Θ X (θ x) = f Θ(θ)f X Θ (x θ) f X (x) Lecture 18 Probability and Statistics, Spring / 10

17 Example Romeo & Juliet I - Juliet will be late on any date by a random amout X U(0, θ) - θ unknown modeled as RV Θ U(0, 1) - Assume that Juliet was late by an amout x on 1st date - How to update the distribution of Θ What we know { 1, if 0 θ 1 1) The prior PDF f Θ(θ) = 0, o.w. 2) Conditional PDF of observation f X Θ (x Θ) = Posterior PDF f Θ X (θ x) = f Θ (θ)f X Θ (x θ) 10 f Θ (θ )f X Θ (x θ )dθ = f Θ X (θ x) = 0 if θ < x or θ > 1 { 1, θ 0, o.w. if 0 x θ 1 θ 1x 1 θ dθ = 1 θ logx, if x θ 1 Lecture 18 Probability and Statistics, Spring / 10

18 Example ii) Romeo & Juliet II - Observe first n dates - Juliet is late by X 1, X 2,, X n U(0, θ) given Θ = θ - Let X = [X 1, X 2,, X n ], x = [x 1, x 2,, x n ] - Conditional PDF f X Θ (x θ) = f { X1 Θ(x 1 θ) f Xn Θ(x n θ) 1 = θ, if max{x n 1,, x n } θ 1 0, o.w. - Posterior PDF 1 f Θ(θ)f X Θ (x θ) 1 = θ n f Θ X (θ x) = 0 fθ(θ )f X Θ (x θ )dθ 1, if x θ 1 1 x (θ ) n dθ 0, o.w. Lecture 18 Probability and Statistics, Spring / 10

19 Example iii) Beta priors on the bias of a coin - Biased coin with probability of heads θ - θ unknown modeled as RV Θ with known prior PDF of Θ - Consider n independent tosses : X = # heads observed - Posterior PDF f Θ X (θ k) = f Θ(θ)p X Θ (k θ) 1 0 fθ(θ )p X Θ (k θ )dθ 1 c = cf Θ (θ)p X Θ (k θ) = c( n k )f Θ(θ)θ k (1 θ) n k - Suppose { 1 B(α,β) f Θ (θ) = θα 1 (1 θ) β 1, if 0 < θ < 1 0, o.w. B(α, β) = 1 0 θα 1 (1 θ) β 1 dθ = (α 1)!(β 1)! (α+β 1)! f Θ X (θ k) = d B(α,β) θk+α 1 (1 θ) n k+β 1, 0 θ 1 α > 0, β > 0 Lecture 18 Probability and Statistics, Spring / 10

20 Example iv) Spam filtering - An message is spam or legitimate - Θ = 1 if spam, Θ = 2 if legit p Θ (1) p Θ (2) - {w 1,, w n } : collection of special words whose appearance suggest a spam - For each i, X i =Bernoulli RV modeling appearance of w i X i = 1 if w i appears, 0 o.w. - Conditional probability p Xi Θ(x i 1), p Xi Θ(x i 2) known - X 1,, X n independent, given Θ - Posterior probability P(Θ = m X i = x i, i = 1,, n) = pθ(m) n i=1 p X i Θ(x i m) 2 j=1 pθ(j) n i=1 p X i Θ(x i j) Lecture 18 Probability and Statistics, Spring / 10

21 Lecture 15: MAP and LMS Estimation Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 15 Probability and Statistics, Fall / 8

22 Maximum a Posterior Probability (MAP) Rule MAP rule i) Observation x given ii) p Θ (θ), p X Θ (x θ) given iii) Want to estimate Θ MAP Rule ˆθ = arg max p Θ X (θ x) θ ˆθ = arg max f Θ X (θ x) θ (θ : discrete) (θ : continuous) Visualization on the board * For discrete Θ, MAP rule minimizes the prob. of an incorrect decision Notes on computation of ˆθ i) From Bayes rule, the denominator is the same for all values of θ e.g. p Θ X (θ x) = pθ(θ)p X Θ(x θ) θ pθ(θ )p X Θ (x θ ) { function of θ constant w.r.t θ Lecture 15 Probability and Statistics, Fall / 8

23 MAP Rule (contd.) ii) Only need to maximize the numerator as p Θ (θ)p X Θ (x θ), (Θ, X both discrete) p Θ (θ)f X Θ (x θ), (Θ discrete, X continuous) ˆθ = arg max θ f Θ (θ)p X Θ (x θ), (Θ continuous, X discrete) f Θ (θ)f X Θ (x θ), (Θ, Xboth continuous) Example (Spam Filtering) i) Θ = 1(spam), Θ = 2(legit) p Θ (1) { p Θ (2) 1, if w i appears in the message ii) X i : Bernoulli 0, o.w. iii) Posterior probability P(Θ = θ X 1 = x 1,, X n = x n ) = p n Θ(θ) i=1 p X i Θ(x i θ) 2 θ =1 pθ(θ ), θ = 1, 2 n i=1 p X i Θ(x i θ ) Lecture 15 Probability and Statistics, Fall / 8

24 Spam Filtering Example (contd.) iv) MAP ˆθ = arg max P(Θ = θ X i = x i, i = 1,..., n) θ n = arg max p Θ (θ) p Xi Θ(x i θ) θ i=1 n n θ = 1 (spam) if p Θ (1) p Xi Θ(x i 1) > p Θ (2) p Xi Θ(x i 2) i=1 i=1 Lecture 15 Probability and Statistics, Fall / 8

25 Example Romeo & Juliet I i) Juliet is late on the first date by a random amount X U(0, Θ) ii) Θ U(0, 1) unknown iii) Posterior PDF {(x [0, 1]) 1 θ log x f Θ X (θ x) =, if x θ 1 0, o.w. MAP : ˆθ = x Pictorial description on the board Lecture 15 Probability and Statistics, Fall / 8

26 Probability of (In)Correct Decision Hypothesis Testing i) Unknown parameter takes one of a finite # of values, each corresponding to a competing hypothesis ii) In the language of Bayesian inference: Θ {θ 1,, θ m} (m=2 : binary hypothesis testing) θ i : hypothesis H i Computing the probability of correct decision i) Given observation X = x ii) MAP rule : g MAP (x) hypothesis selected by MAP given X = x iii) Probability of correct decision P(Θ = g MAP (x) X = x) P(Θ = g MAP (x)) = i P(Θ = θi, X = Si) (S i = {x g MAP (x) = θ i}) iv) Probability of error i P(Θ θi, X = Si) Lecture 15 Probability and Statistics, Fall / 8

27 Example Two biased coins i) Coin 1: prob. of heads=p 1, coin 2: prob. of heads=p 2 ii) Choose a coin at random with equal probability iii) Want to infer its identity based on the outcome of single toss iv) Θ = 1 : coin 1, Θ = 2 : coin 2 X = 1 : head, X = 0 : tail v) MAP rule p Θ(1)p X Θ (x 1) > p Θ(2)p X Θ (x 2) coin 1 P 1 = if P 2 = > coin x = tail What is the probability of incorrect decision? Lecture 15 Probability and Statistics, Fall / 8

28 Example Two biased coins (contd.) vi) n coin toss, X = # heads p Θ (1)p X Θ (k 1) = 1 2 (n k )P 1 k (1 P 1 ) n k p Θ (2)p X Θ (k 2) = 1 2 (n k )P 2 k (1 P 2 ) n k P k 1 (1 P 1 ) n k > P k 2 (1 P 2 ) n k coin 1 o.w. coin 2 Pictorial description on the board vii) Probability of error P(error) = P(Θ = 1, X > k ) + P(Θ = 2, X k ) = p Θ (1) n k=k +1 c(k)p 1 k (1 P 1 ) n k +p Θ (2) k k=1 c(k)p 2 k (1 P 2 ) n k Pictorial description on the board Lecture 15 Probability and Statistics, Fall / 8

29 Lecture 20: Classical Statistical Inference Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 20 Probability and Statistics, Spring / 6

30 Classical Statistical Inference Setup X : random observation p X (x; θ) or f X (x; θ) θ : unknown constant dependence on θ one prob. model for each value of θ Notation E θ [h(x)], P θ (A) dependence on θ Inference methods i) Maximum Likelihood(ML) estimation ii) Linear regression iii) Likelihood ratio test iv) Significance testing Lecture 20 Probability and Statistics, Spring / 6

31 Maximum Likelihood Estimation Maximum Likelihood Estimation X = (X 1,, X n ) vector of observations p X (x; θ) : joint PMF p X (x; θ) ˆθ : estimate of θ ML : ˆθ = arg max θ p X (x 1,, x n ; θ) X discrete arg max θ f X (x 1,, x n ; θ) X continuous Likelihood function p X (x; θ), f X (x; θ) Log-likelihood function Assume X i s are independent p X (x 1,, x n ; θ) = n i=1 p X i (x i ; θ) log p X (x; θ) = n i=1 log p X i (x i ; θ) Interpretation of p X (x; θ) Incorrect : prob that θ is equal to θ Correct : prob of X = x when the unknown parameter is θ ML : with what value of θ, the observations X = x are most likely to arise? Lecture 20 Probability and Statistics, Spring / 6

32 Examples ML vs MAP MAP : arg max θ p Θ (θ)p X Θ (x θ) p Θ flat and p X Θ (x θ) = p X (x; θ) ML : arg max θ p X (x; θ) Example 1 i) Juliet is always late by X U(0, θ) ii) θ unknown constant iii) ML : ˆθ? { 1 f X (x; θ) = θ, 0 x θ ˆθ = x (compare w/ MAP) 0, otherwise Example 2 i) Biased coin(prob of heads θ unknown) ii) X 1,, X n : n independent coin tosses (X i = 1 if head, 0 if tail) iii) p X (x; θ) = n i=1 θxi (1 θ) 1 xi = θ i xi (1 θ) n i xi When x i = k(i.e., k heads out of n tosses) arg max θ θ k (1 θ) n k Lecture 20 Probability and Statistics, Spring / 6

33 Estimation of Mean and Variance Estimation of mean and variance of an RV i) Observations X 1,, X n are i.i.d.,with an unknown common mean θ and known variance v ii) Sample mean M n = M n i.p. i X i n E θ [M n] = θ unbiased θ (weak law of large numbers) consistent iii) Sample Variance S 2 n = 1 n n i=1 (Xi Mn)2 E (θ,v) [ S 2 n ] = n 1 v asymptotically unbiased n 2 Sˆ n = 1 n n 1 i=1 (Xi Mn)2 unbiased E (θ,v) [ S ˆ 2 n ] = v Confidence interval P θ ( ˆΘ n θ ˆΘ + n ) 1 α 1 α confidence interval : [Θ n, Θ + n ] Θ ˆ n : lower estimator Θ ˆ + n : upper estimator *compare w/ point estimator Lecture 20 Probability and Statistics, Spring / 6

34 Example Example i) Observations X 1,, X n are i.i.d. normal, with unknown mean θ known variance v ii) sample mean estimator ˆΘ n = X 1+ +X n normal w/ mean θ, variance v n n iii) α = 0.05 ˆΘ n θ std normal, P θ ( ˆΘ n θ 1.96) = 0.95 v n v n P θ ( ˆΘ n 1.96 v n θ ˆΘ n v n ) = 0.95 [ ˆΘ n 1.96 v n, ˆΘ n v n ] : 0.95 C.I. Interpretation of confidence interval Incorrect : θ is in the CI w.p. at least 1 α Correct : construct a confidence interval many times about 1 α of them are expected to contain θ Lecture 20 Probability and Statistics, Spring / 6

Compute f(x θ)f(θ) dθ

Compute f(x θ)f(θ) dθ Bayesian Updating: Continuous Priors 18.05 Spring 2014 b a Compute f(x θ)f(θ) dθ January 1, 2017 1 /26 Beta distribution Beta(a, b) has density (a + b 1)! f (θ) = θ a 1 (1 θ) b 1 (a 1)!(b 1)! http://mathlets.org/mathlets/beta-distribution/

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58 Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Computational Perception. Bayesian Inference

Computational Perception. Bayesian Inference Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Lecture 2: Convergence of Random Variables

Lecture 2: Convergence of Random Variables Lecture 2: Convergence of Random Variables Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Introduction to Stochastic Processes, Fall 2013 1 / 9 Convergence of Random Variables

More information

Review. December 4 th, Review

Review. December 4 th, Review December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform

More information

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018 CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

Mathematical statistics

Mathematical statistics October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Data Analysis and Uncertainty Part 2: Estimation

Data Analysis and Uncertainty Part 2: Estimation Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable

More information

Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)

Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book) Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book) Principal Topics The approach to certainty with increasing evidence. The approach to consensus for several agents, with increasing

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs Lecture Notes 3 Multiple Random Variables Joint, Marginal, and Conditional pmfs Bayes Rule and Independence for pmfs Joint, Marginal, and Conditional pdfs Bayes Rule and Independence for pdfs Functions

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Lecture 2: Convex Sets and Functions

Lecture 2: Convex Sets and Functions Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

(1) Introduction to Bayesian statistics

(1) Introduction to Bayesian statistics Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Conjugate Priors: Beta and Normal Spring January 1, /15

Conjugate Priors: Beta and Normal Spring January 1, /15 Conjugate Priors: Beta and Normal 18.05 Spring 2014 January 1, 2017 1 /15 Review: Continuous priors, discrete data Bent coin: unknown probability θ of heads. Prior f (θ) = 2θ on [0,1]. Data: heads on one

More information

Chapters 9. Properties of Point Estimators

Chapters 9. Properties of Point Estimators Chapters 9. Properties of Point Estimators Recap Target parameter, or population parameter θ. Population distribution f(x; θ). { probability function, discrete case f(x; θ) = density, continuous case The

More information

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Mathematical statistics

Mathematical statistics October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

CS540 Machine learning L9 Bayesian statistics

CS540 Machine learning L9 Bayesian statistics CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Motif representation using position weight matrix

Motif representation using position weight matrix Motif representation using position weight matrix Xiaohui Xie University of California, Irvine Motif representation using position weight matrix p.1/31 Position weight matrix Position weight matrix representation

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D. probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I

More information

EIE6207: Maximum-Likelihood and Bayesian Estimation

EIE6207: Maximum-Likelihood and Bayesian Estimation EIE6207: Maximum-Likelihood and Bayesian Estimation Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Point Estimation. Vibhav Gogate The University of Texas at Dallas Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables

More information

Comparison of Bayesian and Frequentist Inference

Comparison of Bayesian and Frequentist Inference Comparison of Bayesian and Frequentist Inference 18.05 Spring 2014 First discuss last class 19 board question, January 1, 2017 1 /10 Compare Bayesian inference Uses priors Logically impeccable Probabilities

More information

Lecture 2: Conjugate priors

Lecture 2: Conjugate priors (Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Chapter 8.8.1: A factorization theorem

Chapter 8.8.1: A factorization theorem LECTURE 14 Chapter 8.8.1: A factorization theorem The characterization of a sufficient statistic in terms of the conditional distribution of the data given the statistic can be difficult to work with.

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Lecture 8 October Bayes Estimators and Average Risk Optimality

Lecture 8 October Bayes Estimators and Average Risk Optimality STATS 300A: Theory of Statistics Fall 205 Lecture 8 October 5 Lecturer: Lester Mackey Scribe: Hongseok Namkoong, Phan Minh Nguyen Warning: These notes may contain factual and/or typographic errors. 8.

More information

Bayesian Analysis (Optional)

Bayesian Analysis (Optional) Bayesian Analysis (Optional) 1 2 Big Picture There are two ways to conduct statistical inference 1. Classical method (frequentist), which postulates (a) Probability refers to limiting relative frequencies

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Introduction. Le Song. Machine Learning I CSE 6740, Fall 2013

Introduction. Le Song. Machine Learning I CSE 6740, Fall 2013 Introduction Le Song Machine Learning I CSE 6740, Fall 2013 What is Machine Learning (ML) Study of algorithms that improve their performance at some task with experience 2 Common to Industrial scale problems

More information

Parameter Estimation

Parameter Estimation 1 / 44 Parameter Estimation Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay October 25, 2012 Motivation System Model used to Derive

More information

Discrete Binary Distributions

Discrete Binary Distributions Discrete Binary Distributions Carl Edward Rasmussen November th, 26 Carl Edward Rasmussen Discrete Binary Distributions November th, 26 / 5 Key concepts Bernoulli: probabilities over binary variables Binomial:

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Lecture 11: Probability Distributions and Parameter Estimation

Lecture 11: Probability Distributions and Parameter Estimation Intelligent Data Analysis and Probabilistic Inference Lecture 11: Probability Distributions and Parameter Estimation Recommended reading: Bishop: Chapters 1.2, 2.1 2.3.4, Appendix B Duncan Gillies and

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Conjugate Priors: Beta and Normal Spring 2018

Conjugate Priors: Beta and Normal Spring 2018 Conjugate Priors: Beta and Normal 18.05 Spring 2018 Review: Continuous priors, discrete data Bent coin: unknown probability θ of heads. Prior f (θ) = 2θ on [0,1]. Data: heads on one toss. Question: Find

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Classical and Bayesian inference

Classical and Bayesian inference Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information