Lecture 18: Bayesian Inference
|
|
- Elinor Ward
- 6 years ago
- Views:
Transcription
1 Lecture 18: Bayesian Inference Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 18 Probability and Statistics, Spring / 10
2 Bayesian Statistical Inference Statiscal inference Process of estimating information about an unknown variable/model For example, biased coin with a head coming up w.p. p(unknown) Bayesian vs. Classical Inference Bayesian Unknowns Random variables with known distributions Prior distribution p Θ (θ) Posterior p Θ X (θ x) (x: observed data) Classical Deterministic quantities that happen to be unknown θ: constant Estimate θ with some performance guarantee Lecture 18 Probability and Statistics, Spring / 10
3 Bayesian Statistical Inference (contd.) Inference(model/variable) problem i) Model inference : construct a process and predict the future(weather forecast) ii) Variable inference : estimate unknown parameter(gps reachings and current position) - Example : Noisy channel i) sequence of binary messages S i {0, 1} transmitted over a wireless channel ii) receiver observes X i = as i + W i, i = 1,, n W i N(0, ), a : scalar iii) model inference problem : a unknown(s i s known) iv) variable inference problem : infer S i s(a known) based on X i s Lecture 18 Probability and Statistics, Spring / 10
4 Bayesian Statistical Inference Types of statistical inference problems - Estimation(of unknown { constant or RV) binary - Hypothesis testing m-ary Bayesian inference methods i) Maximum a posterior probability (MAP) rule ii) Least mean squares (LMS) estimation iii) Linear least mean squares estimation Lecture 18 Probability and Statistics, Spring / 10
5 Bayesian Inference and Posterior Distribution Pictorial introduction Bayes rule (i) Θ discrete, X discrete : p Θ X (θ x) = (ii) Θ discrete, X continuous : p Θ X (θ x) = (iii) Θ continuous, X discrete : f Θ X (θ x) = (iv) Θ continuous, X continuous : f Θ X (θ x) = pθ(θ)p X Θ(x θ) θ pθ(θ )p X Θ (x θ ) pθ(θ)f X Θ(x θ) θ pθ(θ )f X Θ (x θ ) fθ(θ)p X Θ(x θ) fθ(θ )p X Θ (x θ )dθ fθ(θ)f X Θ(x θ) fθ(θ )f X Θ (x θ )dθ Lecture 18 Probability and Statistics, Spring / 10
6 Conditional Probability Revisited 4 versions of conditional probability (i) p Θ X (θ x) = p Θ,X (θ,x) p X (x) (ii) (iii) p Θ X (θ x) = P(Θ = θ X = x) = lim δ 0 P(Θ = θ x X x + δ) = lim δ 0 p Θ (θ)p(x X x + δ Θ = θ) P(x X x + δ) = p Θ (θ)f X Θ (x θ) θ p Θ(θ )f X Θ (x θ ) = p Θ(θ)f X Θ (x θ) f X (x) f Θ X (θ x) = lim δ 0 P(θ Θ θ + δ X = x) δ = lim δ 0 P(θ Θ θ + δ)p(x = x θ Θ θ + δ) δp(x = x) = f Θ(θ)p X Θ (x θ) p X (x) (iv) f Θ X (θ x) = f Θ(θ)f X Θ (x θ) f X (x) Lecture 18 Probability and Statistics, Spring / 10
7 Example Romeo & Juliet I - Juliet will be late on any date by a random amout X U(0, θ) - θ unknown modeled as RV Θ U(0, 1) - Assume that Juliet was late by an amout x on 1st date - How to update the distribution of Θ What we know { 1, if 0 θ 1 1) The prior PDF f Θ(θ) = 0, o.w. 2) Conditional PDF of observation f X Θ (x Θ) = Posterior PDF f Θ X (θ x) = f Θ (θ)f X Θ (x θ) 10 f Θ (θ )f X Θ (x θ )dθ = f Θ X (θ x) = 0 if θ < x or θ > 1 { 1, θ 0, o.w. if 0 x θ 1 θ 1x 1 θ dθ = 1 θ logx, if x θ 1 Lecture 18 Probability and Statistics, Spring / 10
8 Example ii) Romeo & Juliet II - Observe first n dates - Juliet is late by X 1, X 2,, X n U(0, θ) given Θ = θ - Let X = [X 1, X 2,, X n ], x = [x 1, x 2,, x n ] - Conditional PDF f X Θ (x θ) = f { X1 Θ(x 1 θ) f Xn Θ(x n θ) 1 = θ, if max{x n 1,, x n } θ 1 0, o.w. - Posterior PDF 1 f Θ(θ)f X Θ (x θ) 1 = θ n f Θ X (θ x) = 0 fθ(θ )f X Θ (x θ )dθ 1, if x θ 1 1 x (θ ) n dθ 0, o.w. Lecture 18 Probability and Statistics, Spring / 10
9 Example iii) Beta priors on the bias of a coin - Biased coin with probability of heads θ - θ unknown modeled as RV Θ with known prior PDF of Θ - Consider n independent tosses : X = # heads observed - Posterior PDF f Θ X (θ k) = f Θ(θ)p X Θ (k θ) 1 0 fθ(θ )p X Θ (k θ )dθ 1 c = cf Θ (θ)p X Θ (k θ) = c( n k )f Θ(θ)θ k (1 θ) n k - Suppose { 1 B(α,β) f Θ (θ) = θα 1 (1 θ) β 1, if 0 < θ < 1 0, o.w. B(α, β) = 1 0 θα 1 (1 θ) β 1 dθ = (α 1)!(β 1)! (α+β 1)! f Θ X (θ k) = d B(α,β) θk+α 1 (1 θ) n k+β 1, 0 θ 1 α > 0, β > 0 Lecture 18 Probability and Statistics, Spring / 10
10 Example iv) Spam filtering - An message is spam or legitimate - Θ = 1 if spam, Θ = 2 if legit p Θ (1) p Θ (2) - {w 1,, w n } : collection of special words whose appearance suggest a spam - For each i, X i =Bernoulli RV modeling appearance of w i X i = 1 if w i appears, 0 o.w. - Conditional probability p Xi Θ(x i 1), p Xi Θ(x i 2) known - X 1,, X n independent, given Θ - Posterior probability P(Θ = m X i = x i, i = 1,, n) = pθ(m) n i=1 p X i Θ(x i m) 2 j=1 pθ(j) n i=1 p X i Θ(x i j) Lecture 18 Probability and Statistics, Spring / 10
11 Lecture 18: Bayesian Inference Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 18 Probability and Statistics, Spring / 10
12 Bayesian Statistical Inference Statiscal inference Process of estimating information about an unknown variable/model For example, biased coin with a head coming up w.p. p(unknown) Bayesian vs. Classical Inference Bayesian Unknowns Random variables with known distributions Prior distribution p Θ (θ) Posterior p Θ X (θ x) (x: observed data) Classical Deterministic quantities that happen to be unknown θ: constant Estimate θ with some performance guarantee Lecture 18 Probability and Statistics, Spring / 10
13 Bayesian Statistical Inference (contd.) Inference(model/variable) problem i) Model inference : construct a process and predict the future(weather forecast) ii) Variable inference : estimate unknown parameter(gps reachings and current position) - Example : Noisy channel i) sequence of binary messages S i {0, 1} transmitted over a wireless channel ii) receiver observes X i = as i + W i, i = 1,, n W i N(0, ), a : scalar iii) model inference problem : a unknown(s i s known) iv) variable inference problem : infer S i s(a known) based on X i s Lecture 18 Probability and Statistics, Spring / 10
14 Bayesian Statistical Inference Types of statistical inference problems - Estimation(of unknown { constant or RV) binary - Hypothesis testing m-ary Bayesian inference methods i) Maximum a posterior probability (MAP) rule ii) Least mean squares (LMS) estimation iii) Linear least mean squares estimation Lecture 18 Probability and Statistics, Spring / 10
15 Bayesian Inference and Posterior Distribution Pictorial introduction Bayes rule (i) Θ discrete, X discrete : p Θ X (θ x) = (ii) Θ discrete, X continuous : p Θ X (θ x) = (iii) Θ continuous, X discrete : f Θ X (θ x) = (iv) Θ continuous, X continuous : f Θ X (θ x) = pθ(θ)p X Θ(x θ) θ pθ(θ )p X Θ (x θ ) pθ(θ)f X Θ(x θ) θ pθ(θ )f X Θ (x θ ) fθ(θ)p X Θ(x θ) fθ(θ )p X Θ (x θ )dθ fθ(θ)f X Θ(x θ) fθ(θ )f X Θ (x θ )dθ Lecture 18 Probability and Statistics, Spring / 10
16 Conditional Probability Revisited 4 versions of conditional probability (i) p Θ X (θ x) = p Θ,X (θ,x) p X (x) (ii) (iii) p Θ X (θ x) = P(Θ = θ X = x) = lim δ 0 P(Θ = θ x X x + δ) = lim δ 0 p Θ (θ)p(x X x + δ Θ = θ) P(x X x + δ) = p Θ (θ)f X Θ (x θ) θ p Θ(θ )f X Θ (x θ ) = p Θ(θ)f X Θ (x θ) f X (x) f Θ X (θ x) = lim δ 0 P(θ Θ θ + δ X = x) δ = lim δ 0 P(θ Θ θ + δ)p(x = x θ Θ θ + δ) δp(x = x) = f Θ(θ)p X Θ (x θ) p X (x) (iv) f Θ X (θ x) = f Θ(θ)f X Θ (x θ) f X (x) Lecture 18 Probability and Statistics, Spring / 10
17 Example Romeo & Juliet I - Juliet will be late on any date by a random amout X U(0, θ) - θ unknown modeled as RV Θ U(0, 1) - Assume that Juliet was late by an amout x on 1st date - How to update the distribution of Θ What we know { 1, if 0 θ 1 1) The prior PDF f Θ(θ) = 0, o.w. 2) Conditional PDF of observation f X Θ (x Θ) = Posterior PDF f Θ X (θ x) = f Θ (θ)f X Θ (x θ) 10 f Θ (θ )f X Θ (x θ )dθ = f Θ X (θ x) = 0 if θ < x or θ > 1 { 1, θ 0, o.w. if 0 x θ 1 θ 1x 1 θ dθ = 1 θ logx, if x θ 1 Lecture 18 Probability and Statistics, Spring / 10
18 Example ii) Romeo & Juliet II - Observe first n dates - Juliet is late by X 1, X 2,, X n U(0, θ) given Θ = θ - Let X = [X 1, X 2,, X n ], x = [x 1, x 2,, x n ] - Conditional PDF f X Θ (x θ) = f { X1 Θ(x 1 θ) f Xn Θ(x n θ) 1 = θ, if max{x n 1,, x n } θ 1 0, o.w. - Posterior PDF 1 f Θ(θ)f X Θ (x θ) 1 = θ n f Θ X (θ x) = 0 fθ(θ )f X Θ (x θ )dθ 1, if x θ 1 1 x (θ ) n dθ 0, o.w. Lecture 18 Probability and Statistics, Spring / 10
19 Example iii) Beta priors on the bias of a coin - Biased coin with probability of heads θ - θ unknown modeled as RV Θ with known prior PDF of Θ - Consider n independent tosses : X = # heads observed - Posterior PDF f Θ X (θ k) = f Θ(θ)p X Θ (k θ) 1 0 fθ(θ )p X Θ (k θ )dθ 1 c = cf Θ (θ)p X Θ (k θ) = c( n k )f Θ(θ)θ k (1 θ) n k - Suppose { 1 B(α,β) f Θ (θ) = θα 1 (1 θ) β 1, if 0 < θ < 1 0, o.w. B(α, β) = 1 0 θα 1 (1 θ) β 1 dθ = (α 1)!(β 1)! (α+β 1)! f Θ X (θ k) = d B(α,β) θk+α 1 (1 θ) n k+β 1, 0 θ 1 α > 0, β > 0 Lecture 18 Probability and Statistics, Spring / 10
20 Example iv) Spam filtering - An message is spam or legitimate - Θ = 1 if spam, Θ = 2 if legit p Θ (1) p Θ (2) - {w 1,, w n } : collection of special words whose appearance suggest a spam - For each i, X i =Bernoulli RV modeling appearance of w i X i = 1 if w i appears, 0 o.w. - Conditional probability p Xi Θ(x i 1), p Xi Θ(x i 2) known - X 1,, X n independent, given Θ - Posterior probability P(Θ = m X i = x i, i = 1,, n) = pθ(m) n i=1 p X i Θ(x i m) 2 j=1 pθ(j) n i=1 p X i Θ(x i j) Lecture 18 Probability and Statistics, Spring / 10
21 Lecture 15: MAP and LMS Estimation Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 15 Probability and Statistics, Fall / 8
22 Maximum a Posterior Probability (MAP) Rule MAP rule i) Observation x given ii) p Θ (θ), p X Θ (x θ) given iii) Want to estimate Θ MAP Rule ˆθ = arg max p Θ X (θ x) θ ˆθ = arg max f Θ X (θ x) θ (θ : discrete) (θ : continuous) Visualization on the board * For discrete Θ, MAP rule minimizes the prob. of an incorrect decision Notes on computation of ˆθ i) From Bayes rule, the denominator is the same for all values of θ e.g. p Θ X (θ x) = pθ(θ)p X Θ(x θ) θ pθ(θ )p X Θ (x θ ) { function of θ constant w.r.t θ Lecture 15 Probability and Statistics, Fall / 8
23 MAP Rule (contd.) ii) Only need to maximize the numerator as p Θ (θ)p X Θ (x θ), (Θ, X both discrete) p Θ (θ)f X Θ (x θ), (Θ discrete, X continuous) ˆθ = arg max θ f Θ (θ)p X Θ (x θ), (Θ continuous, X discrete) f Θ (θ)f X Θ (x θ), (Θ, Xboth continuous) Example (Spam Filtering) i) Θ = 1(spam), Θ = 2(legit) p Θ (1) { p Θ (2) 1, if w i appears in the message ii) X i : Bernoulli 0, o.w. iii) Posterior probability P(Θ = θ X 1 = x 1,, X n = x n ) = p n Θ(θ) i=1 p X i Θ(x i θ) 2 θ =1 pθ(θ ), θ = 1, 2 n i=1 p X i Θ(x i θ ) Lecture 15 Probability and Statistics, Fall / 8
24 Spam Filtering Example (contd.) iv) MAP ˆθ = arg max P(Θ = θ X i = x i, i = 1,..., n) θ n = arg max p Θ (θ) p Xi Θ(x i θ) θ i=1 n n θ = 1 (spam) if p Θ (1) p Xi Θ(x i 1) > p Θ (2) p Xi Θ(x i 2) i=1 i=1 Lecture 15 Probability and Statistics, Fall / 8
25 Example Romeo & Juliet I i) Juliet is late on the first date by a random amount X U(0, Θ) ii) Θ U(0, 1) unknown iii) Posterior PDF {(x [0, 1]) 1 θ log x f Θ X (θ x) =, if x θ 1 0, o.w. MAP : ˆθ = x Pictorial description on the board Lecture 15 Probability and Statistics, Fall / 8
26 Probability of (In)Correct Decision Hypothesis Testing i) Unknown parameter takes one of a finite # of values, each corresponding to a competing hypothesis ii) In the language of Bayesian inference: Θ {θ 1,, θ m} (m=2 : binary hypothesis testing) θ i : hypothesis H i Computing the probability of correct decision i) Given observation X = x ii) MAP rule : g MAP (x) hypothesis selected by MAP given X = x iii) Probability of correct decision P(Θ = g MAP (x) X = x) P(Θ = g MAP (x)) = i P(Θ = θi, X = Si) (S i = {x g MAP (x) = θ i}) iv) Probability of error i P(Θ θi, X = Si) Lecture 15 Probability and Statistics, Fall / 8
27 Example Two biased coins i) Coin 1: prob. of heads=p 1, coin 2: prob. of heads=p 2 ii) Choose a coin at random with equal probability iii) Want to infer its identity based on the outcome of single toss iv) Θ = 1 : coin 1, Θ = 2 : coin 2 X = 1 : head, X = 0 : tail v) MAP rule p Θ(1)p X Θ (x 1) > p Θ(2)p X Θ (x 2) coin 1 P 1 = if P 2 = > coin x = tail What is the probability of incorrect decision? Lecture 15 Probability and Statistics, Fall / 8
28 Example Two biased coins (contd.) vi) n coin toss, X = # heads p Θ (1)p X Θ (k 1) = 1 2 (n k )P 1 k (1 P 1 ) n k p Θ (2)p X Θ (k 2) = 1 2 (n k )P 2 k (1 P 2 ) n k P k 1 (1 P 1 ) n k > P k 2 (1 P 2 ) n k coin 1 o.w. coin 2 Pictorial description on the board vii) Probability of error P(error) = P(Θ = 1, X > k ) + P(Θ = 2, X k ) = p Θ (1) n k=k +1 c(k)p 1 k (1 P 1 ) n k +p Θ (2) k k=1 c(k)p 2 k (1 P 2 ) n k Pictorial description on the board Lecture 15 Probability and Statistics, Fall / 8
29 Lecture 20: Classical Statistical Inference Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 20 Probability and Statistics, Spring / 6
30 Classical Statistical Inference Setup X : random observation p X (x; θ) or f X (x; θ) θ : unknown constant dependence on θ one prob. model for each value of θ Notation E θ [h(x)], P θ (A) dependence on θ Inference methods i) Maximum Likelihood(ML) estimation ii) Linear regression iii) Likelihood ratio test iv) Significance testing Lecture 20 Probability and Statistics, Spring / 6
31 Maximum Likelihood Estimation Maximum Likelihood Estimation X = (X 1,, X n ) vector of observations p X (x; θ) : joint PMF p X (x; θ) ˆθ : estimate of θ ML : ˆθ = arg max θ p X (x 1,, x n ; θ) X discrete arg max θ f X (x 1,, x n ; θ) X continuous Likelihood function p X (x; θ), f X (x; θ) Log-likelihood function Assume X i s are independent p X (x 1,, x n ; θ) = n i=1 p X i (x i ; θ) log p X (x; θ) = n i=1 log p X i (x i ; θ) Interpretation of p X (x; θ) Incorrect : prob that θ is equal to θ Correct : prob of X = x when the unknown parameter is θ ML : with what value of θ, the observations X = x are most likely to arise? Lecture 20 Probability and Statistics, Spring / 6
32 Examples ML vs MAP MAP : arg max θ p Θ (θ)p X Θ (x θ) p Θ flat and p X Θ (x θ) = p X (x; θ) ML : arg max θ p X (x; θ) Example 1 i) Juliet is always late by X U(0, θ) ii) θ unknown constant iii) ML : ˆθ? { 1 f X (x; θ) = θ, 0 x θ ˆθ = x (compare w/ MAP) 0, otherwise Example 2 i) Biased coin(prob of heads θ unknown) ii) X 1,, X n : n independent coin tosses (X i = 1 if head, 0 if tail) iii) p X (x; θ) = n i=1 θxi (1 θ) 1 xi = θ i xi (1 θ) n i xi When x i = k(i.e., k heads out of n tosses) arg max θ θ k (1 θ) n k Lecture 20 Probability and Statistics, Spring / 6
33 Estimation of Mean and Variance Estimation of mean and variance of an RV i) Observations X 1,, X n are i.i.d.,with an unknown common mean θ and known variance v ii) Sample mean M n = M n i.p. i X i n E θ [M n] = θ unbiased θ (weak law of large numbers) consistent iii) Sample Variance S 2 n = 1 n n i=1 (Xi Mn)2 E (θ,v) [ S 2 n ] = n 1 v asymptotically unbiased n 2 Sˆ n = 1 n n 1 i=1 (Xi Mn)2 unbiased E (θ,v) [ S ˆ 2 n ] = v Confidence interval P θ ( ˆΘ n θ ˆΘ + n ) 1 α 1 α confidence interval : [Θ n, Θ + n ] Θ ˆ n : lower estimator Θ ˆ + n : upper estimator *compare w/ point estimator Lecture 20 Probability and Statistics, Spring / 6
34 Example Example i) Observations X 1,, X n are i.i.d. normal, with unknown mean θ known variance v ii) sample mean estimator ˆΘ n = X 1+ +X n normal w/ mean θ, variance v n n iii) α = 0.05 ˆΘ n θ std normal, P θ ( ˆΘ n θ 1.96) = 0.95 v n v n P θ ( ˆΘ n 1.96 v n θ ˆΘ n v n ) = 0.95 [ ˆΘ n 1.96 v n, ˆΘ n v n ] : 0.95 C.I. Interpretation of confidence interval Incorrect : θ is in the CI w.p. at least 1 α Correct : construct a confidence interval many times about 1 α of them are expected to contain θ Lecture 20 Probability and Statistics, Spring / 6
Compute f(x θ)f(θ) dθ
Bayesian Updating: Continuous Priors 18.05 Spring 2014 b a Compute f(x θ)f(θ) dθ January 1, 2017 1 /26 Beta distribution Beta(a, b) has density (a + b 1)! f (θ) = θ a 1 (1 θ) b 1 (a 1)!(b 1)! http://mathlets.org/mathlets/beta-distribution/
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationOverview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationMathematical statistics
October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationComputational Perception. Bayesian Inference
Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationLecture 2: Convergence of Random Variables
Lecture 2: Convergence of Random Variables Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Introduction to Stochastic Processes, Fall 2013 1 / 9 Convergence of Random Variables
More informationReview. December 4 th, Review
December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationCLASS NOTES Models, Algorithms and Data: Introduction to computing 2018
CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,
More informationBayesian Methods. David S. Rosenberg. New York University. March 20, 2018
Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian
More informationMathematical statistics
October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationData Analysis and Uncertainty Part 2: Estimation
Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable
More informationSome Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)
Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book) Principal Topics The approach to certainty with increasing evidence. The approach to consensus for several agents, with increasing
More informationEstimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator
Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationDecision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over
Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationLecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs
Lecture Notes 3 Multiple Random Variables Joint, Marginal, and Conditional pmfs Bayes Rule and Independence for pmfs Joint, Marginal, and Conditional pdfs Bayes Rule and Independence for pdfs Functions
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationLecture 2: Convex Sets and Functions
Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More information(1) Introduction to Bayesian statistics
Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationConjugate Priors: Beta and Normal Spring January 1, /15
Conjugate Priors: Beta and Normal 18.05 Spring 2014 January 1, 2017 1 /15 Review: Continuous priors, discrete data Bent coin: unknown probability θ of heads. Prior f (θ) = 2θ on [0,1]. Data: heads on one
More informationChapters 9. Properties of Point Estimators
Chapters 9. Properties of Point Estimators Recap Target parameter, or population parameter θ. Population distribution f(x; θ). { probability function, discrete case f(x; θ) = density, continuous case The
More informationLecture 23 Maximum Likelihood Estimation and Bayesian Inference
Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationMathematical statistics
October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationCS540 Machine learning L9 Bayesian statistics
CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationMotif representation using position weight matrix
Motif representation using position weight matrix Xiaohui Xie University of California, Irvine Motif representation using position weight matrix p.1/31 Position weight matrix Position weight matrix representation
More informationEstimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio
Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More information1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.
probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I
More informationEIE6207: Maximum-Likelihood and Bayesian Estimation
EIE6207: Maximum-Likelihood and Bayesian Estimation Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationPoint Estimation. Vibhav Gogate The University of Texas at Dallas
Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables
More informationComparison of Bayesian and Frequentist Inference
Comparison of Bayesian and Frequentist Inference 18.05 Spring 2014 First discuss last class 19 board question, January 1, 2017 1 /10 Compare Bayesian inference Uses priors Logically impeccable Probabilities
More informationLecture 2: Conjugate priors
(Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More informationIntroduction to Machine Learning
Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationChapter 8.8.1: A factorization theorem
LECTURE 14 Chapter 8.8.1: A factorization theorem The characterization of a sufficient statistic in terms of the conditional distribution of the data given the statistic can be difficult to work with.
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationLecture 8 October Bayes Estimators and Average Risk Optimality
STATS 300A: Theory of Statistics Fall 205 Lecture 8 October 5 Lecturer: Lester Mackey Scribe: Hongseok Namkoong, Phan Minh Nguyen Warning: These notes may contain factual and/or typographic errors. 8.
More informationBayesian Analysis (Optional)
Bayesian Analysis (Optional) 1 2 Big Picture There are two ways to conduct statistical inference 1. Classical method (frequentist), which postulates (a) Probability refers to limiting relative frequencies
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationIntroduction. Le Song. Machine Learning I CSE 6740, Fall 2013
Introduction Le Song Machine Learning I CSE 6740, Fall 2013 What is Machine Learning (ML) Study of algorithms that improve their performance at some task with experience 2 Common to Industrial scale problems
More informationParameter Estimation
1 / 44 Parameter Estimation Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay October 25, 2012 Motivation System Model used to Derive
More informationDiscrete Binary Distributions
Discrete Binary Distributions Carl Edward Rasmussen November th, 26 Carl Edward Rasmussen Discrete Binary Distributions November th, 26 / 5 Key concepts Bernoulli: probabilities over binary variables Binomial:
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationLecture 11: Probability Distributions and Parameter Estimation
Intelligent Data Analysis and Probabilistic Inference Lecture 11: Probability Distributions and Parameter Estimation Recommended reading: Bishop: Chapters 1.2, 2.1 2.3.4, Appendix B Duncan Gillies and
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationConjugate Priors: Beta and Normal Spring 2018
Conjugate Priors: Beta and Normal 18.05 Spring 2018 Review: Continuous priors, discrete data Bent coin: unknown probability θ of heads. Prior f (θ) = 2θ on [0,1]. Data: heads on one toss. Question: Find
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationClassical and Bayesian inference
Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationCOMP 551 Applied Machine Learning Lecture 19: Bayesian Inference
COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationECE531 Lecture 10b: Maximum Likelihood Estimation
ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So
More information