Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
|
|
- James Atkins
- 5 years ago
- Views:
Transcription
1 Bayesian statistics DS GA 1002 Statistical and Mathematical Models Carlos Fernandez-Granda
2 Frequentist vs Bayesian statistics In frequentist statistics the data are modeled as realizations from a distribution that depends on deterministic parameters In Bayesian statistics the parameters are modeled as random variables This allows to quantify our prior uncertainty and incorporate additional information
3 Learning Bayesian models Conjugate priors Bayesian estimators
4 Prior distribution and likelihood The data x R n are a realization of a random vector X, which depends on a vector of parameters Θ Modeling choices: Prior distribution: Distribution of Θ encoding our uncertainty about the model before seeing the data Likelihood: Conditional distribution of X given Θ
5 Posterior distribution The posterior distribution is the conditional distribution of Θ given X Evaluating the posterior at the data x allows to update our uncertainty about Θ using the data
6 Bernoulli distribution Goal: Estimating Bernoulli parameter from iid data We consider two different Bayesian estimators Θ 1 and Θ 2 : 1. Θ 1 is a conservative estimator with a uniform prior pdf { 1 for 0 θ 1 f Θ1 (θ) = 0 otherwise 2. Θ 2 has a prior pdf skewed towards 1 { 2 θ for 0 θ 1 f Θ2 (θ) = 0 otherwise
7 Prior distributions
8 Bernoulli distribution: likelihood The data are assumed to be iid, so the likelihood is p X Θ ( x θ)
9 Bernoulli distribution: likelihood The data are assumed to be iid, so the likelihood is p X Θ ( x θ) = θ n 1 (1 θ) n 0 n 0 is the number of zeros and n 1 the number of ones
10 Bernoulli distribution: posterior distribution f Θ1 X (θ x)
11 Bernoulli distribution: posterior distribution f Θ1 X (θ x) = f Θ 1 (θ) p X Θ1 ( x θ) p X ( x)
12 Bernoulli distribution: posterior distribution f Θ1 X (θ x) = f Θ 1 (θ) p X Θ1 ( x θ) p X ( x) f Θ1 (θ) p X Θ1 ( x θ) = u f Θ 1 (u) p X Θ1 ( x u) du
13 Bernoulli distribution: posterior distribution f Θ1 X (θ x) = f Θ 1 (θ) p X Θ1 ( x θ) p X ( x) f Θ1 (θ) p X Θ1 ( x θ) = u f Θ 1 (u) p X Θ1 ( x u) du θ n 1 (1 θ) n 0 = u un 1 (1 u) n 0 du
14 Bernoulli distribution: posterior distribution f Θ1 X (θ x) = f Θ 1 (θ) p X Θ1 ( x θ) p X ( x) f Θ1 (θ) p X Θ1 ( x θ) = u f Θ 1 (u) p X Θ1 ( x u) du θ n 1 (1 θ) n 0 = u un 1 (1 u) n 0 du = θn 1 (1 θ) n 0 β (n 1 + 1, n 0 + 1) β (a, b) := u a 1 (1 u) b 1 du u
15 Bernoulli distribution: posterior distribution f Θ2 X (θ x)
16 Bernoulli distribution: posterior distribution f Θ2 X (θ x) = f Θ 2 (θ) p X Θ2 ( x θ) p X ( x)
17 Bernoulli distribution: posterior distribution f Θ2 X (θ x) = f Θ 2 (θ) p X Θ2 ( x θ) p X ( x) θ n1+1 (1 θ) n 0 = u un1+1 (1 u) n 0 du
18 Bernoulli distribution: posterior distribution f Θ2 X (θ x) = f Θ 2 (θ) p X Θ2 ( x θ) p X ( x) θ n1+1 (1 θ) n 0 = u un1+1 (1 u) n 0 du = θn 1+1 (1 θ) n 0 β (n 1 + 2, n 0 + 1) β (a, b) := u a 1 (1 u) b 1 du u
19 Bernoulli distribution: n 0 = 1, n 1 =
20 Bernoulli distribution: n 0 = 3, n 1 =
21 Bernoulli distribution: n 0 = 91, n 1 = Posterior mean (uniform prior) Posterior mean (skewed prior) ML estimator
22 Learning Bayesian models Conjugate priors Bayesian estimators
23 Beta distribution The pdf of a beta distribution with parameters a and b is defined as f β (θ; a, b) := { θ a 1 (1 θ) b 1 β(a,b), if 0 θ 1, 0 otherwise β (a, b) := u a 1 (1 u) b 1 du u
24 Learning a Bernoulli distribution The first prior is beta with parameters a = 1 and b = 1 The second prior is beta with parameters a = 2 and b = 1 The posteriors are beta with parameters a = n 1 + 1, b = n and a = n 1 + 2, b = n respectively
25 Conjugate priors A conjugate family of distributions for a certain likelihood satisfies the following property: If the prior belongs to the family, the posterior also belongs to the family Beta distributions are conjugate priors when the likelihood is binomial
26 The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x)
27 The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x) = f Θ (θ) p X Θ (x θ) p X (x)
28 The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x) = f Θ (θ) p X Θ (x θ) p X (x) f Θ (θ) p X Θ (x θ) = u f Θ (u) p X Θ (x u) du
29 The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x) = f Θ (θ) p X Θ (x θ) p X (x) f Θ (θ) p X Θ (x θ) = u f Θ (u) p X Θ (x u) du θ a 1 (1 θ) b 1 ( ) n x θ x (1 θ) n x = u ua 1 (1 u) b 1 ( n x) u x (1 u) n x du
30 The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x) = f Θ (θ) p X Θ (x θ) p X (x) f Θ (θ) p X Θ (x θ) = u f Θ (u) p X Θ (x u) du θ a 1 (1 θ) b 1 ( ) n x θ x (1 θ) n x = u ua 1 (1 u) b 1 ( n x) u x (1 u) n x du θ x+a 1 (1 θ) n x+b 1 = u ux+a 1 (1 u) n x+b 1 du
31 The beta distribution is conjugate to the binomial likelihood Θ is beta with parameters a and b X is binomial with parameters n and Θ f Θ X (θ x) = f Θ (θ) p X Θ (x θ) p X (x) f Θ (θ) p X Θ (x θ) = u f Θ (u) p X Θ (x u) du θ a 1 (1 θ) b 1 ( ) n x θ x (1 θ) n x = u ua 1 (1 u) b 1 ( n x) u x (1 u) n x du θ x+a 1 (1 θ) n x+b 1 = u ux+a 1 (1 u) n x+b 1 du = f β (θ; x + a, n x + b)
32 Poll in New Mexico 449 participants, 227 people intend to vote for Clinton and 202 for Trump Probability that Trump wins in New Mexico? Assumptions: Fraction of Trump voters is modeled as a random variable Θ Poll participants are selected uniformly at random with replacement Number of Trump voters in the poll is binomial with parameters n = 449 and p = Θ
33 Poll in New Mexico Prior is uniform, so beta with parameters a = 1 and b = 1 Likelihood is binomial Posterior is beta with parameters a = and b = The probability that Trump wins in New Mexico is the probability that Θ given the data is greater than 0.5
34 Poll in New Mexico % 11.4%
35 Learning Bayesian models Conjugate priors Bayesian estimators
36 Bayesian estimators What estimator should be use? Two main options: The posterior mean The posterior mode
37 Posterior mean Mean of the posterior distribution θ MMSE ( x) := E ( Θ X = x ) Minimum mean-square-error (MMSE) estimate For any arbitrary estimator θ other ( x), ( ( E θ other ( X ) Θ ) ) ( 2 ( E θ MMSE ( X ) Θ ) ) 2
38 Posterior mean ( ( E θ other ( X ) Θ ) ) 2 X = x
39 Posterior mean ( ( E θ other ( X ) Θ ) ) 2 X = x ( ( = E θ other ( X ) θ MMSE ( X ) + θ MMSE ( X ) Θ ) 2 ) X = x
40 Posterior mean ( ( E θ other ( X ) Θ ) ) 2 X = x ( ( = E θ other ( X ) θ MMSE ( X ) + θ MMSE ( X ) Θ ) 2 ) X = x ( ( = (θ other ( x) θ MMSE ( x)) 2 + E θ MMSE ( X ) Θ ) 2 ) X = x ( ( )) + 2 (θ other ( x) θ MMSE ( x)) E θ MMSE ( x) E Θ X = x
41 Posterior mean ( ( E θ other ( X ) Θ ) ) 2 X = x ( ( = E θ other ( X ) θ MMSE ( X ) + θ MMSE ( X ) Θ ) 2 ) X = x ( ( = (θ other ( x) θ MMSE ( x)) 2 + E θ MMSE ( X ) Θ ) 2 ) X = x ( ( )) + 2 (θ other ( x) θ MMSE ( x)) E θ MMSE ( x) E Θ X = x ( ( = (θ other ( x) θ MMSE ( x)) 2 + E θ MMSE ( X ) Θ ) 2 ) X = x
42 Posterior mean By iterated expectation, ( ( E θ other ( X ) ) 2 ) Θ ( ( ( = E E θ other ( X ) Θ ) )) 2 X
43 Posterior mean By iterated expectation, ( ( E θ other ( X ) ) 2 ) Θ ( ( ( = E E θ other ( X ) Θ ) )) 2 X ( ( = E θ other ( X ) θ MMSE ( X ) ) ( 2 ( ( ) + E E θ MMSE ( X ) Θ ) 2 ) ) X
44 Posterior mean By iterated expectation, ( ( E θ other ( X ) ) 2 ) Θ ( ( ( = E E θ other ( X ) Θ ) )) 2 X ( ( = E θ other ( X ) θ MMSE ( X ) ) ( 2 ( ( ) + E E θ MMSE ( X ) Θ ) 2 ) ) X ( ( = E θ other ( X ) θ MMSE ( X ) ) ( 2 ( ) + E θ MMSE ( X ) Θ ) ) 2
45 Posterior mean By iterated expectation, ( ( E θ other ( X ) ) 2 ) Θ ( ( ( = E E θ other ( X ) Θ ) )) 2 X ( ( = E θ other ( X ) θ MMSE ( X ) ) ( 2 ( ( ) + E E θ MMSE ( X ) Θ ) 2 ) ) X ( ( = E θ other ( X ) θ MMSE ( X ) ) ( 2 ( ) + E θ MMSE ( X ) Θ ) ) 2 ( ( E θ MMSE ( X ) Θ ) ) 2
46 Bernoulli distribution: n 0 = 1, n 1 =
47 Bernoulli distribution: n 0 = 3, n 1 =
48 Bernoulli distribution: n 0 = 91, n 1 = Posterior mean (uniform prior) Posterior mean (skewed prior) ML estimator
49 Posterior mode The maximum-a-posteriori (MAP) estimator is the mode of the posterior distribution ( ) θ MAP ( x) := arg max p Θ X θ x θ if Θ is discrete and if Θ is continuous ( ) θ MAP ( x) := arg max f Θ X θ x θ
50 Maximum-likelihood estimator If the prior is uniform the ML estimator coincides with the MAP estimator ( ) arg max f Θ X θ x θ
51 Maximum-likelihood estimator If the prior is uniform the ML estimator coincides with the MAP estimator ( ) arg max f Θ X θ x = arg max θ θ ( ) f Θ θ f X Θ ( x θ ) u f Θ (u) f X Θ ( x u) du
52 Maximum-likelihood estimator If the prior is uniform the ML estimator coincides with the MAP estimator ( ) arg max f Θ X θ x = arg max θ θ = arg max f X Θ ( x θ θ ( ) f Θ θ f X Θ ( x θ ) u f Θ (u) f X Θ ( x u) du )
53 Maximum-likelihood estimator If the prior is uniform the ML estimator coincides with the MAP estimator ( ) arg max f Θ X θ x = arg max θ θ = arg max f X Θ ( x θ θ ( ) = arg max L x θ θ ( ) f Θ θ f X Θ ( x θ ) u f Θ (u) f X Θ ( x u) du )
54 Maximum-likelihood estimator If the prior is uniform the ML estimator coincides with the MAP estimator ( ) arg max f Θ X θ x = arg max θ θ = arg max f X Θ ( x θ θ ( ) = arg max L x θ θ ( ) f Θ θ f X Θ ( x θ ) u f Θ (u) f X Θ ( x u) du ) Uniform priors are only well defined over bounded domains
55 Probability of error If Θ is discrete, MAP estimator minimizes the probability of error For any arbitrary estimator θ other ( x) ( P θ other ( X ) Θ ) ( P θ MAP ( X ) Θ )
56 Probability of error ( P Θ = θ other ( X ) )
57 Probability of error ( P Θ = θ other ( X ) ( ) = f X ( x) P Θ = θ other ( x) ) X = x d x x
58 Probability of error ( P Θ = θ other ( X ) ) = x = x ( f X ( x) P Θ = θ other ( x) X ) = x d x f X ( x) p Θ X (θ other ( x) x) d x
59 Probability of error ( P Θ = θ other ( X ) ) = x = x x ( f X ( x) P Θ = θ other ( x) X ) = x d x f X ( x) p Θ X (θ other ( x) x) d x f X ( x) p Θ X (θ MAP ( x) x) d x
60 Probability of error ( P Θ = θ other ( X ) ) = x = x ( f X ( x) P Θ = θ other ( x) X ) = x d x f X ( x) p Θ X (θ other ( x) x) d x f X ( x) p Θ X (θ MAP ( x) x) d x x ( = P Θ = θ MAP ( X ) )
61 Sending bits Model for communication channel: signal Θ encodes a single bit Prior knowledge indicates that a 0 is 3 times more likely than a 1 p Θ (1) = 1 4, p Θ (0) = 3 4. The channel is noisy, so we send the signal n times At the receptor we observe X i = Θ + Z i, 1 i n, where Z is iid standard Gaussian
62 Sending bits: ML estimator The likelihood is equal to L x (θ) = The log-likelihood is equal to = n f Xi Θ ( x i θ) i=1 n i=1 1 e ( x i θ)2 2 2π n ( x i θ) 2 log L x (θ) = 2 i=1 n log 2π 2
63 Sending bits: ML estimator θ ML ( x) = 1 if log L x (1) = n i=1 n i=1 = log L x (0) x i 2 2 x i + 1 n log 2π 2 2 x i 2 2 n log 2π 2 Equivalently, θ ML ( x) = { 1 if 1 n n i=1 x i > otherwise
64 Sending bits: ML estimator The probability of error is ( P Θ θ ML ( X ) )
65 Sending bits: ML estimator The probability of error is ( P Θ θ ML ( X ) ) (Θ θ ML ( X ) ) Θ = 0 P (Θ = 0) + P = P (Θ θ ML ( X ) Θ = 1 ) P (Θ = 1)
66 Sending bits: ML estimator The probability of error is ( P Θ θ ML ( X ) ) = P (Θ θ ML ( X ) ) Θ = 0 P (Θ = 0) + P (Θ θ ML ( X ) ) Θ = 1 P (Θ = 1) ( 1 n = P x i > 1 ) ( n 2 Θ = 0 1 n P (Θ = 0) + P x i < 1 ) n 2 Θ = 1 P (Θ = 1) i=1 i=1
67 Sending bits: ML estimator The probability of error is ( P Θ θ ML ( X ) ) = P (Θ θ ML ( X ) ) Θ = 0 P (Θ = 0) + P (Θ θ ML ( X ) ) Θ = 1 P (Θ = 1) ( 1 n = P x i > 1 ) ( n 2 Θ = 0 1 n P (Θ = 0) + P x i < 1 ) n 2 Θ = 1 P (Θ = 1) i=1 i=1 = Q ( n/2 )
68 Sending bits: MAP estimator The logarithm of the posterior is equal to log p Θ X (θ x)
69 Sending bits: MAP estimator The logarithm of the posterior is equal to n i=1 log p Θ X (θ x) = log f Xi Θ ( x i θ) p Θ (θ) f X ( x)
70 Sending bits: MAP estimator The logarithm of the posterior is equal to n i=1 log p Θ X (θ x) = log f Xi Θ ( x i θ) p Θ (θ) f X ( x) n = log f Xi Θ ( x i θ) p Θ (θ) log f X ( x) i=1
71 Sending bits: MAP estimator The logarithm of the posterior is equal to n i=1 log p Θ X (θ x) = log f Xi Θ ( x i θ) p Θ (θ) f X ( x) n = log f Xi Θ ( x i θ) p Θ (θ) log f X ( x) i=1 = n i=1 x i 2 2 x i θ + θ 2 n 2 2 log 2π + log p Θ (θ) log f X ( x)
72 Sending bits: MAP estimator θ MAP ( x) = 1 if log p Θ X (1 x) + log f X ( x) = n i=1 n i=1 x i 2 2 x i + 1 n log 2π log x i 2 2 n log 2π log 4 + log 3 2 = log p Θ X (0 x) + log f X ( x). Equivalently, θ MAP ( x) = { 1 if 1 n n i=1 x i > log 3 n, 0 otherwise.
73 Sending bits: MAP estimator The probability of error is P (Θ θ MAP ( x))
74 Sending bits: MAP estimator The probability of error is P (Θ θ MAP ( x)) = P (Θ θ MAP ( x) Θ = 0) P (Θ = 0) + P (Θ θ MAP ( x) Θ = 1) P (Θ = 1)
75 Sending bits: MAP estimator The probability of error is P (Θ θ MAP ( x)) = P (Θ θ MAP ( x) Θ = 0) P (Θ = 0) + P (Θ θ MAP ( x) Θ = 1) P (Θ = 1) ( 1 n = P x i > 1 n 2 + log 3 ) n Θ = 0 P (Θ = 0) i=1 ( 1 n + P x i < 1 n 2 + log 3 ) n Θ = 1 P (Θ = 1) i=1
76 Sending bits: MAP estimator The probability of error is P (Θ θ MAP ( x)) = P (Θ θ MAP ( x) Θ = 0) P (Θ = 0) + P (Θ θ MAP ( x) Θ = 1) P (Θ = 1) ( 1 n = P x i > 1 n 2 + log 3 ) n Θ = 0 P (Θ = 0) i=1 ( 1 n + P x i < 1 n 2 + log 3 ) n Θ = 1 P (Θ = 1) i=1 = 3 ( ) n/2 4 Q log ( ) n/2 n 4 Q log 3 n
77 Sending bits: Probability of error ML estimator MAP estimator Probability of error n
DS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationRandom variables. DS GA 1002 Probability and Statistics for Data Science.
Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities
More informationIntroduction to Machine Learning
Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationOverview. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda
Overview DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationPoint Estimation. Vibhav Gogate The University of Texas at Dallas
Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables
More informationLearning with Probabilities
Learning with Probabilities CS194-10 Fall 2011 Lecture 15 CS194-10 Fall 2011 Lecture 15 1 Outline Bayesian learning eliminates arbitrary loss functions and regularizers facilitates incorporation of prior
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationModel Averaging (Bayesian Learning)
Model Averaging (Bayesian Learning) We want to predict the output Y of a new case that has input X = x given the training examples e: p(y x e) = m M P(Y m x e) = m M P(Y m x e)p(m x e) = m M P(Y m x)p(m
More informationLinear regression. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Linear regression DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall15 Carlos Fernandez-Granda Linear models Least-squares estimation Overfitting Example:
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More informationDensity Estimation: ML, MAP, Bayesian estimation
Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationPattern Recognition. Parameter Estimation of Probability Density Functions
Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The
More informationComputational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept
Computational Biology Lecture #3: Probability and Statistics Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept 26 2005 L2-1 Basic Probabilities L2-2 1 Random Variables L2-3 Examples
More informationMachine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014
Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,
More informationInference for a Population Proportion
Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist
More informationBayesian Methods. David S. Rosenberg. New York University. March 20, 2018
Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian
More informationEstimation of Quantiles
9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles
More informationA Brief Review of Probability, Bayesian Statistics, and Information Theory
A Brief Review of Probability, Bayesian Statistics, and Information Theory Brendan Frey Electrical and Computer Engineering University of Toronto frey@psi.toronto.edu http://www.psi.toronto.edu A system
More informationApplied Bayesian Statistics STAT 388/488
STAT 388/488 Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago August 29, 207 Course Info STAT 388/488 http://math.luc.edu/~ebalderama/bayes 2 A motivating example (See
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationEstimation techniques
Estimation techniques March 2, 2006 Contents 1 Problem Statement 2 2 Bayesian Estimation Techniques 2 2.1 Minimum Mean Squared Error (MMSE) estimation........................ 2 2.1.1 General formulation......................................
More informationLecture 6: Gaussian Mixture Models (GMM)
Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationData Analysis and Uncertainty Part 2: Estimation
Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable
More informationEstimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio
Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationError analysis for efficiency
Glen Cowan RHUL Physics 28 July, 2008 Error analysis for efficiency To estimate a selection efficiency using Monte Carlo one typically takes the number of events selected m divided by the number generated
More informationDynamic Multipath Estimation by Sequential Monte Carlo Methods
Dynamic Multipath Estimation by Sequential Monte Carlo Methods M. Lentmaier, B. Krach, P. Robertson, and T. Thiasiriphet German Aerospace Center (DLR) Slide 1 Outline Multipath problem and signal model
More informationIntro to Bayesian Methods
Intro to Bayesian Methods Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Lecture 1 1 Course Webpage Syllabus LaTeX reference manual R markdown reference manual Please come to office
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationPROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:
More information(1) Introduction to Bayesian statistics
Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationComputational Perception. Bayesian Inference
Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationEstimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator
Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationBayesian Regression (1/31/13)
STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed
More informationBayesian Inference. STA 121: Regression Analysis Artin Armagan
Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y
More informationPART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics
Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationClassical and Bayesian inference
Classical and Bayesian inference AMS 132 January 18, 2018 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 18, 2018 1 / 9 Sampling from a Bernoulli Distribution Theorem (Beta-Bernoulli
More informationLecture 20 Random Samples 0/ 13
0/ 13 One of the most important concepts in statistics is that of a random sample. The definition of a random sample is rather abstract. However it is critical to understand the idea behind the definition,
More informationBlind Equalization via Particle Filtering
Blind Equalization via Particle Filtering Yuki Yoshida, Kazunori Hayashi, Hideaki Sakai Department of System Science, Graduate School of Informatics, Kyoto University Historical Remarks A sequential Monte
More information(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics
Estimating a proportion using the beta/binomial model A fundamental task in statistics is to estimate a proportion using a series of trials: What is the success probability of a new cancer treatment? What
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationCS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationUse of the likelihood principle in physics. Statistics II
Use of the likelihood principle in physics Statistics II 1 2 3 + Bayesians vs Frequentists 4 Why ML does work? hypothesis observation 5 6 7 8 9 10 11 ) 12 13 14 15 16 Fit of Histograms corresponds This
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationRandom Processes. DS GA 1002 Probability and Statistics for Data Science.
Random Processes DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Aim Modeling quantities that evolve in time (or space)
More informationParameter Estimation. Industrial AI Lab.
Parameter Estimation Industrial AI Lab. Generative Model X Y w y = ω T x + ε ε~n(0, σ 2 ) σ 2 2 Maximum Likelihood Estimation (MLE) Estimate parameters θ ω, σ 2 given a generative model Given observed
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationAarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell
Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework
More informationSTAT J535: Chapter 5: Classes of Bayesian Priors
STAT J535: Chapter 5: Classes of Bayesian Priors David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 The Bayesian Prior A prior distribution must be specified in a Bayesian analysis. The choice
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationLinear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Linear Models DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Linear regression Least-squares estimation
More informationBayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification
Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University of Oklahoma
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationCOMP 551 Applied Machine Learning Lecture 19: Bayesian Inference
COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted
More informationINTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP
INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More information