Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Size: px
Start display at page:

Download "Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda"

Transcription

1 Bayesian statistics DS GA 1002 Statistical and Mathematical Models Carlos Fernandez-Granda

2 Frequentist vs Bayesian statistics In frequentist statistics the data are modeled as realizations from a distribution that depends on deterministic parameters In Bayesian statistics the parameters are modeled as random variables This allows to quantify our prior uncertainty and incorporate additional information

3 Learning Bayesian models Conjugate priors Bayesian estimators

4 Prior distribution and likelihood The data x R n are a realization of a random vector X, which depends on a vector of parameters Θ Modeling choices: Prior distribution: Distribution of Θ encoding our uncertainty about the model before seeing the data Likelihood: Conditional distribution of X given Θ

5 Posterior distribution The posterior distribution is the conditional distribution of Θ given X Evaluating the posterior at the data x allows to update our uncertainty about Θ using the data

6 Bernoulli distribution Goal: Estimating Bernoulli parameter from iid data We consider two different Bayesian estimators Θ 1 and Θ 2 : 1. Θ 1 is a conservative estimator with a uniform prior pdf { 1 for 0 θ 1 f Θ1 (θ) = 0 otherwise 2. Θ 2 has a prior pdf skewed towards 1 { 2 θ for 0 θ 1 f Θ2 (θ) = 0 otherwise

7 Prior distributions

8 Bernoulli distribution: likelihood The data are assumed to be iid, so the likelihood is p X Θ ( x θ)

9 Bernoulli distribution: likelihood The data are assumed to be iid, so the likelihood is p X Θ ( x θ) = θ n 1 (1 θ) n 0 n 0 is the number of zeros and n 1 the number of ones

10 Bernoulli distribution: posterior distribution f Θ1 X (θ x)

11 Bernoulli distribution: posterior distribution f Θ1 X (θ x) = f Θ 1 (θ) p X Θ1 ( x θ) p X ( x)

12 Bernoulli distribution: posterior distribution f Θ1 X (θ x) = f Θ 1 (θ) p X Θ1 ( x θ) p X ( x) f Θ1 (θ) p X Θ1 ( x θ) = u f Θ 1 (u) p X Θ1 ( x u) du

13 Bernoulli distribution: posterior distribution f Θ1 X (θ x) = f Θ 1 (θ) p X Θ1 ( x θ) p X ( x) f Θ1 (θ) p X Θ1 ( x θ) = u f Θ 1 (u) p X Θ1 ( x u) du θ n 1 (1 θ) n 0 = u un 1 (1 u) n 0 du

14 Bernoulli distribution: posterior distribution f Θ1 X (θ x) = f Θ 1 (θ) p X Θ1 ( x θ) p X ( x) f Θ1 (θ) p X Θ1 ( x θ) = u f Θ 1 (u) p X Θ1 ( x u) du θ n 1 (1 θ) n 0 = u un 1 (1 u) n 0 du = θn 1 (1 θ) n 0 β (n 1 + 1, n 0 + 1) β (a, b) := u a 1 (1 u) b 1 du u

15 Bernoulli distribution: posterior distribution f Θ2 X (θ x)

16 Bernoulli distribution: posterior distribution f Θ2 X (θ x) = f Θ 2 (θ) p X Θ2 ( x θ) p X ( x)

17 Bernoulli distribution: posterior distribution f Θ2 X (θ x) = f Θ 2 (θ) p X Θ2 ( x θ) p X ( x) θ n1+1 (1 θ) n 0 = u un1+1 (1 u) n 0 du

18 Bernoulli distribution: posterior distribution f Θ2 X (θ x) = f Θ 2 (θ) p X Θ2 ( x θ) p X ( x) θ n1+1 (1 θ) n 0 = u un1+1 (1 u) n 0 du = θn 1+1 (1 θ) n 0 β (n 1 + 2, n 0 + 1) β (a, b) := u a 1 (1 u) b 1 du u

19 Bernoulli distribution: n 0 = 1, n 1 =

20 Bernoulli distribution: n 0 = 3, n 1 =

21 Bernoulli distribution: n 0 = 91, n 1 = Posterior mean (uniform prior) Posterior mean (skewed prior) ML estimator

22 Learning Bayesian models Conjugate priors Bayesian estimators

23 Beta distribution The pdf of a beta distribution with parameters a and b is defined as f β (θ; a, b) := { θ a 1 (1 θ) b 1 β(a,b), if 0 θ 1, 0 otherwise β (a, b) := u a 1 (1 u) b 1 du u

24 Learning a Bernoulli distribution The first prior is beta with parameters a = 1 and b = 1 The second prior is beta with parameters a = 2 and b = 1 The posteriors are beta with parameters a = n 1 + 1, b = n and a = n 1 + 2, b = n respectively

25 Conjugate priors A conjugate family of distributions for a certain likelihood satisfies the following property: If the prior belongs to the family, the posterior also belongs to the family Beta distributions are conjugate priors when the likelihood is binomial

26 The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x)

27 The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x) = f Θ (θ) p X Θ (x θ) p X (x)

28 The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x) = f Θ (θ) p X Θ (x θ) p X (x) f Θ (θ) p X Θ (x θ) = u f Θ (u) p X Θ (x u) du

29 The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x) = f Θ (θ) p X Θ (x θ) p X (x) f Θ (θ) p X Θ (x θ) = u f Θ (u) p X Θ (x u) du θ a 1 (1 θ) b 1 ( ) n x θ x (1 θ) n x = u ua 1 (1 u) b 1 ( n x) u x (1 u) n x du

30 The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x) = f Θ (θ) p X Θ (x θ) p X (x) f Θ (θ) p X Θ (x θ) = u f Θ (u) p X Θ (x u) du θ a 1 (1 θ) b 1 ( ) n x θ x (1 θ) n x = u ua 1 (1 u) b 1 ( n x) u x (1 u) n x du θ x+a 1 (1 θ) n x+b 1 = u ux+a 1 (1 u) n x+b 1 du

31 The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x) = f Θ (θ) p X Θ (x θ) p X (x) f Θ (θ) p X Θ (x θ) = u f Θ (u) p X Θ (x u) du θ a 1 (1 θ) b 1 ( ) n x θ x (1 θ) n x = u ua 1 (1 u) b 1 ( n x) u x (1 u) n x du θ x+a 1 (1 θ) n x+b 1 = u ux+a 1 (1 u) n x+b 1 du = f β (θ; x + a, n x + b)

32 Poll in New Mexico 449 participants, 227 people intend to vote for Clinton and 202 for Trump Probability that Trump wins in New Mexico? Assumptions: Fraction of Trump voters is modeled as a random variable Θ Poll participants are selected uniformly at random with replacement Number of Trump voters in the poll is binomial with parameters n = 449 and p = Θ

33 Poll in New Mexico Prior is uniform, so beta with parameters a = 1 and b = 1 Likelihood is binomial Posterior is beta with parameters a = and b = The probability that Trump wins in New Mexico is the probability that Θ given the data is greater than 0.5

34 Poll in New Mexico % 11.4%

35 Learning Bayesian models Conjugate priors Bayesian estimators

36 Bayesian estimators What estimator should be use? Two main options: The posterior mean The posterior mode

37 Posterior mean Mean of the posterior distribution θ MMSE ( x) := E ( Θ X = x ) Minimum mean-square-error (MMSE) estimate For any arbitrary estimator θ other ( x), ( ( E θ other ( X ) Θ ) ) ( 2 ( E θ MMSE ( X ) Θ ) ) 2

38 Posterior mean ( ( E θ other ( X ) Θ ) ) 2 X = x

39 Posterior mean ( ( E θ other ( X ) Θ ) ) 2 X = x ( ( = E θ other ( X ) θ MMSE ( X ) + θ MMSE ( X ) Θ ) 2 ) X = x

40 Posterior mean ( ( E θ other ( X ) Θ ) ) 2 X = x ( ( = E θ other ( X ) θ MMSE ( X ) + θ MMSE ( X ) Θ ) 2 ) X = x ( ( = (θ other ( x) θ MMSE ( x)) 2 + E θ MMSE ( X ) Θ ) 2 ) X = x ( ( )) + 2 (θ other ( x) θ MMSE ( x)) E θ MMSE ( x) E Θ X = x

41 Posterior mean ( ( E θ other ( X ) Θ ) ) 2 X = x ( ( = E θ other ( X ) θ MMSE ( X ) + θ MMSE ( X ) Θ ) 2 ) X = x ( ( = (θ other ( x) θ MMSE ( x)) 2 + E θ MMSE ( X ) Θ ) 2 ) X = x ( ( )) + 2 (θ other ( x) θ MMSE ( x)) E θ MMSE ( x) E Θ X = x ( ( = (θ other ( x) θ MMSE ( x)) 2 + E θ MMSE ( X ) Θ ) 2 ) X = x

42 Posterior mean By iterated expectation, ( ( E θ other ( X ) ) 2 ) Θ ( ( ( = E E θ other ( X ) Θ ) )) 2 X

43 Posterior mean By iterated expectation, ( ( E θ other ( X ) ) 2 ) Θ ( ( ( = E E θ other ( X ) Θ ) )) 2 X ( ( = E θ other ( X ) θ MMSE ( X ) ) ( 2 ( ( ) + E E θ MMSE ( X ) Θ ) 2 ) ) X

44 Posterior mean By iterated expectation, ( ( E θ other ( X ) ) 2 ) Θ ( ( ( = E E θ other ( X ) Θ ) )) 2 X ( ( = E θ other ( X ) θ MMSE ( X ) ) ( 2 ( ( ) + E E θ MMSE ( X ) Θ ) 2 ) ) X ( ( = E θ other ( X ) θ MMSE ( X ) ) ( 2 ( ) + E θ MMSE ( X ) Θ ) ) 2

45 Posterior mean By iterated expectation, ( ( E θ other ( X ) ) 2 ) Θ ( ( ( = E E θ other ( X ) Θ ) )) 2 X ( ( = E θ other ( X ) θ MMSE ( X ) ) ( 2 ( ( ) + E E θ MMSE ( X ) Θ ) 2 ) ) X ( ( = E θ other ( X ) θ MMSE ( X ) ) ( 2 ( ) + E θ MMSE ( X ) Θ ) ) 2 ( ( E θ MMSE ( X ) Θ ) ) 2

46 Bernoulli distribution: n 0 = 1, n 1 =

47 Bernoulli distribution: n 0 = 3, n 1 =

48 Bernoulli distribution: n 0 = 91, n 1 = Posterior mean (uniform prior) Posterior mean (skewed prior) ML estimator

49 Posterior mode The maximum-a-posteriori (MAP) estimator is the mode of the posterior distribution ( ) θ MAP ( x) := arg max p Θ X θ x θ if Θ is discrete and if Θ is continuous ( ) θ MAP ( x) := arg max f Θ X θ x θ

50 Maximum-likelihood estimator If the prior is uniform the ML estimator coincides with the MAP estimator ( ) arg max f Θ X θ x θ

51 Maximum-likelihood estimator If the prior is uniform the ML estimator coincides with the MAP estimator ( ) arg max f Θ X θ x = arg max θ θ ( ) f Θ θ f X Θ ( x θ ) u f Θ (u) f X Θ ( x u) du

52 Maximum-likelihood estimator If the prior is uniform the ML estimator coincides with the MAP estimator ( ) arg max f Θ X θ x = arg max θ θ = arg max f X Θ ( x θ θ ( ) f Θ θ f X Θ ( x θ ) u f Θ (u) f X Θ ( x u) du )

53 Maximum-likelihood estimator If the prior is uniform the ML estimator coincides with the MAP estimator ( ) arg max f Θ X θ x = arg max θ θ = arg max f X Θ ( x θ θ ( ) = arg max L x θ θ ( ) f Θ θ f X Θ ( x θ ) u f Θ (u) f X Θ ( x u) du )

54 Maximum-likelihood estimator If the prior is uniform the ML estimator coincides with the MAP estimator ( ) arg max f Θ X θ x = arg max θ θ = arg max f X Θ ( x θ θ ( ) = arg max L x θ θ ( ) f Θ θ f X Θ ( x θ ) u f Θ (u) f X Θ ( x u) du ) Uniform priors are only well defined over bounded domains

55 Probability of error If Θ is discrete, MAP estimator minimizes the probability of error For any arbitrary estimator θ other ( x) ( P θ other ( X ) Θ ) ( P θ MAP ( X ) Θ )

56 Probability of error ( P Θ = θ other ( X ) )

57 Probability of error ( P Θ = θ other ( X ) ( ) = f X ( x) P Θ = θ other ( x) ) X = x d x x

58 Probability of error ( P Θ = θ other ( X ) ) = x = x ( f X ( x) P Θ = θ other ( x) X ) = x d x f X ( x) p Θ X (θ other ( x) x) d x

59 Probability of error ( P Θ = θ other ( X ) ) = x = x x ( f X ( x) P Θ = θ other ( x) X ) = x d x f X ( x) p Θ X (θ other ( x) x) d x f X ( x) p Θ X (θ MAP ( x) x) d x

60 Probability of error ( P Θ = θ other ( X ) ) = x = x ( f X ( x) P Θ = θ other ( x) X ) = x d x f X ( x) p Θ X (θ other ( x) x) d x f X ( x) p Θ X (θ MAP ( x) x) d x x ( = P Θ = θ MAP ( X ) )

61 Sending bits Model for communication channel: signal Θ encodes a single bit Prior knowledge indicates that a 0 is 3 times more likely than a 1 p Θ (1) = 1 4, p Θ (0) = 3 4. The channel is noisy, so we send the signal n times At the receptor we observe X i = Θ + Z i, 1 i n, where Z is iid standard Gaussian

62 Sending bits: ML estimator The likelihood is equal to L x (θ) = The log-likelihood is equal to = n f Xi Θ ( x i θ) i=1 n i=1 1 e ( x i θ)2 2 2π n ( x i θ) 2 log L x (θ) = 2 i=1 n log 2π 2

63 Sending bits: ML estimator θ ML ( x) = 1 if log L x (1) = n i=1 n i=1 = log L x (0) x i 2 2 x i + 1 n log 2π 2 2 x i 2 2 n log 2π 2 Equivalently, θ ML ( x) = { 1 if 1 n n i=1 x i > otherwise

64 Sending bits: ML estimator The probability of error is ( P Θ θ ML ( X ) )

65 Sending bits: ML estimator The probability of error is ( P Θ θ ML ( X ) ) (Θ θ ML ( X ) ) Θ = 0 P (Θ = 0) + P = P (Θ θ ML ( X ) Θ = 1 ) P (Θ = 1)

66 Sending bits: ML estimator The probability of error is ( P Θ θ ML ( X ) ) = P (Θ θ ML ( X ) ) Θ = 0 P (Θ = 0) + P (Θ θ ML ( X ) ) Θ = 1 P (Θ = 1) ( 1 n = P x i > 1 ) ( n 2 Θ = 0 1 n P (Θ = 0) + P x i < 1 ) n 2 Θ = 1 P (Θ = 1) i=1 i=1

67 Sending bits: ML estimator The probability of error is ( P Θ θ ML ( X ) ) = P (Θ θ ML ( X ) ) Θ = 0 P (Θ = 0) + P (Θ θ ML ( X ) ) Θ = 1 P (Θ = 1) ( 1 n = P x i > 1 ) ( n 2 Θ = 0 1 n P (Θ = 0) + P x i < 1 ) n 2 Θ = 1 P (Θ = 1) i=1 i=1 = Q ( n/2 )

68 Sending bits: MAP estimator The logarithm of the posterior is equal to log p Θ X (θ x)

69 Sending bits: MAP estimator The logarithm of the posterior is equal to n i=1 log p Θ X (θ x) = log f Xi Θ ( x i θ) p Θ (θ) f X ( x)

70 Sending bits: MAP estimator The logarithm of the posterior is equal to n i=1 log p Θ X (θ x) = log f Xi Θ ( x i θ) p Θ (θ) f X ( x) n = log f Xi Θ ( x i θ) p Θ (θ) log f X ( x) i=1

71 Sending bits: MAP estimator The logarithm of the posterior is equal to n i=1 log p Θ X (θ x) = log f Xi Θ ( x i θ) p Θ (θ) f X ( x) n = log f Xi Θ ( x i θ) p Θ (θ) log f X ( x) i=1 = n i=1 x i 2 2 x i θ + θ 2 n 2 2 log 2π + log p Θ (θ) log f X ( x)

72 Sending bits: MAP estimator θ MAP ( x) = 1 if log p Θ X (1 x) + log f X ( x) = n i=1 n i=1 x i 2 2 x i + 1 n log 2π log x i 2 2 n log 2π log 4 + log 3 2 = log p Θ X (0 x) + log f X ( x). Equivalently, θ MAP ( x) = { 1 if 1 n n i=1 x i > log 3 n, 0 otherwise.

73 Sending bits: MAP estimator The probability of error is P (Θ θ MAP ( x))

74 Sending bits: MAP estimator The probability of error is P (Θ θ MAP ( x)) = P (Θ θ MAP ( x) Θ = 0) P (Θ = 0) + P (Θ θ MAP ( x) Θ = 1) P (Θ = 1)

75 Sending bits: MAP estimator The probability of error is P (Θ θ MAP ( x)) = P (Θ θ MAP ( x) Θ = 0) P (Θ = 0) + P (Θ θ MAP ( x) Θ = 1) P (Θ = 1) ( 1 n = P x i > 1 n 2 + log 3 ) n Θ = 0 P (Θ = 0) i=1 ( 1 n + P x i < 1 n 2 + log 3 ) n Θ = 1 P (Θ = 1) i=1

76 Sending bits: MAP estimator The probability of error is P (Θ θ MAP ( x)) = P (Θ θ MAP ( x) Θ = 0) P (Θ = 0) + P (Θ θ MAP ( x) Θ = 1) P (Θ = 1) ( 1 n = P x i > 1 n 2 + log 3 ) n Θ = 0 P (Θ = 0) i=1 ( 1 n + P x i < 1 n 2 + log 3 ) n Θ = 1 P (Θ = 1) i=1 = 3 ( ) n/2 4 Q log ( ) n/2 n 4 Q log 3 n

77 Sending bits: Probability of error ML estimator MAP estimator Probability of error n

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Random variables. DS GA 1002 Probability and Statistics for Data Science. Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Overview. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Overview. DS GA 1002 Probability and Statistics for Data Science.   Carlos Fernandez-Granda Overview DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Point Estimation. Vibhav Gogate The University of Texas at Dallas Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables

More information

Learning with Probabilities

Learning with Probabilities Learning with Probabilities CS194-10 Fall 2011 Lecture 15 CS194-10 Fall 2011 Lecture 15 1 Outline Bayesian learning eliminates arbitrary loss functions and regularizers facilitates incorporation of prior

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Model Averaging (Bayesian Learning)

Model Averaging (Bayesian Learning) Model Averaging (Bayesian Learning) We want to predict the output Y of a new case that has input X = x given the training examples e: p(y x e) = m M P(Y m x e) = m M P(Y m x e)p(m x e) = m M P(Y m x)p(m

More information

Linear regression. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Linear regression. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Linear regression DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall15 Carlos Fernandez-Granda Linear models Least-squares estimation Overfitting Example:

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning

More information

Density Estimation: ML, MAP, Bayesian estimation

Density Estimation: ML, MAP, Bayesian estimation Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform

More information

Pattern Recognition. Parameter Estimation of Probability Density Functions

Pattern Recognition. Parameter Estimation of Probability Density Functions Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The

More information

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept Computational Biology Lecture #3: Probability and Statistics Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept 26 2005 L2-1 Basic Probabilities L2-2 1 Random Variables L2-3 Examples

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

Inference for a Population Proportion

Inference for a Population Proportion Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

Estimation of Quantiles

Estimation of Quantiles 9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles

More information

A Brief Review of Probability, Bayesian Statistics, and Information Theory

A Brief Review of Probability, Bayesian Statistics, and Information Theory A Brief Review of Probability, Bayesian Statistics, and Information Theory Brendan Frey Electrical and Computer Engineering University of Toronto frey@psi.toronto.edu http://www.psi.toronto.edu A system

More information

Applied Bayesian Statistics STAT 388/488

Applied Bayesian Statistics STAT 388/488 STAT 388/488 Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago August 29, 207 Course Info STAT 388/488 http://math.luc.edu/~ebalderama/bayes 2 A motivating example (See

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Estimation techniques

Estimation techniques Estimation techniques March 2, 2006 Contents 1 Problem Statement 2 2 Bayesian Estimation Techniques 2 2.1 Minimum Mean Squared Error (MMSE) estimation........................ 2 2.1.1 General formulation......................................

More information

Lecture 6: Gaussian Mixture Models (GMM)

Lecture 6: Gaussian Mixture Models (GMM) Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Data Analysis and Uncertainty Part 2: Estimation

Data Analysis and Uncertainty Part 2: Estimation Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Error analysis for efficiency

Error analysis for efficiency Glen Cowan RHUL Physics 28 July, 2008 Error analysis for efficiency To estimate a selection efficiency using Monte Carlo one typically takes the number of events selected m divided by the number generated

More information

Dynamic Multipath Estimation by Sequential Monte Carlo Methods

Dynamic Multipath Estimation by Sequential Monte Carlo Methods Dynamic Multipath Estimation by Sequential Monte Carlo Methods M. Lentmaier, B. Krach, P. Robertson, and T. Thiasiriphet German Aerospace Center (DLR) Slide 1 Outline Multipath problem and signal model

More information

Intro to Bayesian Methods

Intro to Bayesian Methods Intro to Bayesian Methods Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Lecture 1 1 Course Webpage Syllabus LaTeX reference manual R markdown reference manual Please come to office

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:

More information

(1) Introduction to Bayesian statistics

(1) Introduction to Bayesian statistics Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Computational Perception. Bayesian Inference

Computational Perception. Bayesian Inference Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Bayesian Inference. STA 121: Regression Analysis Artin Armagan Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y

More information

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Classical and Bayesian inference

Classical and Bayesian inference Classical and Bayesian inference AMS 132 January 18, 2018 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 18, 2018 1 / 9 Sampling from a Bernoulli Distribution Theorem (Beta-Bernoulli

More information

Lecture 20 Random Samples 0/ 13

Lecture 20 Random Samples 0/ 13 0/ 13 One of the most important concepts in statistics is that of a random sample. The definition of a random sample is rather abstract. However it is critical to understand the idea behind the definition,

More information

Blind Equalization via Particle Filtering

Blind Equalization via Particle Filtering Blind Equalization via Particle Filtering Yuki Yoshida, Kazunori Hayashi, Hideaki Sakai Department of System Science, Graduate School of Informatics, Kyoto University Historical Remarks A sequential Monte

More information

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics Estimating a proportion using the beta/binomial model A fundamental task in statistics is to estimate a proportion using a series of trials: What is the success probability of a new cancer treatment? What

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

y Xw 2 2 y Xw λ w 2 2

y Xw 2 2 y Xw λ w 2 2 CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the

More information

Use of the likelihood principle in physics. Statistics II

Use of the likelihood principle in physics. Statistics II Use of the likelihood principle in physics Statistics II 1 2 3 + Bayesians vs Frequentists 4 Why ML does work? hypothesis observation 5 6 7 8 9 10 11 ) 12 13 14 15 16 Fit of Histograms corresponds This

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Random Processes. DS GA 1002 Probability and Statistics for Data Science.

Random Processes. DS GA 1002 Probability and Statistics for Data Science. Random Processes DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Aim Modeling quantities that evolve in time (or space)

More information

Parameter Estimation. Industrial AI Lab.

Parameter Estimation. Industrial AI Lab. Parameter Estimation Industrial AI Lab. Generative Model X Y w y = ω T x + ε ε~n(0, σ 2 ) σ 2 2 Maximum Likelihood Estimation (MLE) Estimate parameters θ ω, σ 2 given a generative model Given observed

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework

More information

STAT J535: Chapter 5: Classes of Bayesian Priors

STAT J535: Chapter 5: Classes of Bayesian Priors STAT J535: Chapter 5: Classes of Bayesian Priors David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 The Bayesian Prior A prior distribution must be specified in a Bayesian analysis. The choice

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Linear Models DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Linear regression Least-squares estimation

More information

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University of Oklahoma

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted

More information

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information