DS-GA 1002 Lecture notes 11 Fall Bayesian statistics
|
|
- Ashlie Hollie Bennett
- 5 years ago
- Views:
Transcription
1 DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian statistics these parameters are modeled as random variables, which allows to quantify our uncertainty about them. 1 Learning Bayesian models We consider the problem of learning a parametrized distribution from a set of data. In the Bayesian framework, we model the data x R n as a realization of a random vector X, which depends on a vector of parameters Θ that is also random. This requires two modeling choices: 1. The prior distribution is the distribution of Θ, which encodes our uncertainty about the model before seeing the data.. The likelihood is the conditional distribution of X given Θ, which specifies how the data depend on the parameters. In contrast to the frequentist framework, the likelihood is not interpreted as a deterministic function of the parameters. Our goal when learning a Bayesian model is to compute the posterior distribution of the parameters Θ given X. Evaluating this posterior distribution at the realization x allows to update our uncertainty about Θ using the data. The following example illustrates Bayesian inference applied to fitting the parameter of a Bernoulli random variable from iid realizations. Example 1.1 Bernoulli distribution. Let x be a vector of data that we wish to model as iid samples from a Bernoulli distribution. Since we are taking a Bayesian approach we choose a prior distribution for the parameter of the Bernoulli. We will consider two different Bayesian estimators Θ 1 and Θ : 1. Θ 1 represents a conservative estimator in terms of prior information. We assign a uniform pdf to the parameter. Any value in the unit interval has the same probability density: { 1 for 0 θ 1, f Θ1 θ = 1 0 otherwise.
2 Prior distribution n 0 = 1, n 1 = n 0 = 3, n 1 = 1 n 0 = 91, n 1 = Posterior mean uniform prior Posterior mean skewed prior ML estimator Figure 1: Posterior distributions of the parameter of a Bernoulli for two different priors and for different data realizations.
3 . Θ is an estimator that assumes that the parameter is closer to 1 than to 0. We could use it for instance to capture the suspicion that a coin is biased towards heads. We choose a skewed pdf that increases linearly from zero to one, { θ for 0 θ 1, f Θ θ = 0 otherwise. By the iid assumption, the likelihood, which is just the conditional pmf of the data given the parameter of the Bernoulli, equals p X Θ x θ = θ n 1 1 θ n 0, 3 where n 1 is the number of ones in the data and n 0 the number of zeros see Example 1.3 in Lecture Notes 9. The posterior pdfs of the two estimators are consequently equal to where f Θ1 X θ x = f Θ 1 θ p X Θ1 x θ p X x f Θ1 θ p X Θ1 x θ = f u Θ 1 u p X Θ1 x u du θ n 1 1 θ n 0 = u un 1 1 u n 0 du = θn 1 1 θ n 0 β n 1 + 1, n 0 + 1, 7 f Θ X θ x = f Θ θ p X Θ x θ p X x θ n1+1 1 θ n 0 = u un u n 0 β a, b := du 8 9 = θn θ n 0 β n 1 +, n 0 + 1, 10 u 11 u a 1 1 u b 1 du 1 is a special function called the beta function or Euler integral of the first kind, which is tabulated. Figure shows the plot of the posterior distribution for different values of n 1 and n 0. It also shows the maximum-likelihood estimator of the parameter, which is just n 1 / n 0 + n 1 see Example 1.3 in Lecture Notes 9. For a small number of flips, the posterior pdf of Θ is skewed to the right with respect to that of Θ 1, reflecting the prior belief that the parameter is closer to 1. However for a large number of flips both posterior densities are very close. 3
4 Conjugate priors Both posterior distributions in Example 1.1 are beta distributions. Definition.1 Beta distribution. The pdf of a beta distribution with parameters a and b is defined as f β θ; a, b := { θ a 1 1 θ b 1 βa,b, if 0 θ 1, 0 otherwise, 13 where β a, b := u a 1 1 u b 1 du. 14 u In fact, the uniform prior of Θ 1 in Example 1.1 is also a beta distribution, the parameters are a = 1 and b = 1. This is also the case for the skewed prior of Θ : it is a beta distribution with parameters a = and b = 1. Since the prior and the posterior belong to the same family we can view the posterior as an updated estimate of the parameters of the distribution. When the prior and posterior are guaranteed to belong to the same family of distributions for a particular likelihood, the distributions are called conjugate priors. Definition. Conjugate priors. A conjugate family of distributions for a certain likelihood satisfies the following property: if the prior belongs to the family, then the posterior also belongs to the family. Beta distributions are conjugate priors when the likelihood is binomial. Theorem.3 The beta distribution is conjugate to the binomial likelihood. If the prior distribution of Θ is a beta distributions with parameters a and b and the likelihood of the data X given Θ is binomial with parameters n and x, then the posterior distribution of Θ given X is a beta distribution with parameters x + a and n x + b. 4
5 % 11.4% Figure : Posterior distribution of the fraction of Trump voters in New Mexico conditioned on the poll data in Example.4. Proof. f Θ X θ x = f Θ θ p X Θ x θ 15 p X x f Θ θ p X Θ x θ = f 16 u Θ u p X Θ x u du θ a 1 1 θ b 1 n x θ x 1 θ n x = u ua 1 1 u b 1 17 n x ux 1 u n x du θ x+a 1 1 θ n x+b 1 = 18 u ux+a 1 1 u n x+b 1 du = f β θ; x + a, n x + b. 19 Note that the posteriors obtained in Example 1.1 follow immediately from the theorem. Example.4 Poll in New Mexico. In a poll in New Mexico with 449 participants, 7 people intend to vote for Clinton and 0 for Trump the data are from a real poll 1, but for simplicity we are ignoring the other candidates and people that were undecided. Our aim 1 The poll results are taken from 5
6 is to use a Bayesian framework to predict the outcome of the election in New Mexico using these data. We model the fraction of people that vote for Trump as a random variable Θ. We assume that the n people in the poll are chosen uniformly at random with replacement from the population, so given Θ = θ the number of Trump voters is a binomial with parameters n and θ. We don t have any additional information about the possible value of Θ, so we assume it is uniform or equivalently a beta distribution with parameters a := 1 and b := 1. By Theorem.3 the posterior distribution of Θ given the data that we observe is a beta distribution with parameters a := 03 and b := 8. The corresponding probability that Θ 0.5 is 11.4%, which is our estimate for the probability that Trump wins in New Mexico. 3 Bayesian estimators As we have seen, the Bayesian approach to learning probabilistic models yields the posterior distribution of the parameters of interest, as opposed to a single estimate. In this section we describe two possible estimators that are derived from the posterior distribution. 3.1 Minimum mean-square-error estimation The mean of the posterior distribution is the conditional expectation of the parameters given the data. Choosing the posterior mean as an estimator for the parameters Θ has a strong theoretical justification: it is guaranteed to achieve the minimum mean square error MSE among all possible estimators. Theorem 3.1 The posterior mean minimizes the MSE. The posterior mean is the minimum mean-square-error MMSE estimate of the parameter Θ given the data X. To be more precise, let us define For any arbitrary estimator θ other x, θ MMSE x := E Θ X = x. 0 E θ other X Θ E θ MMSE X Θ. 1 6
7 Proof. We begin by computing the MSE of the arbitrary estimator conditioned on X = x in terms of the conditional expectation of Θ given X, E θ other X Θ X = x = E θ other X θ MMSE X + θ MMSE X Θ X = x 3 = θ other x θ MMSE x + E θ MMSE X Θ X = x 4 + θ other x θ MMSE x E θ MMSE x E Θ X = x = θ other x θ MMSE x + E θ MMSE X Θ X = x. 5 By iterated expectation, E θ other X Θ = E E θ other X Θ X 6 = E θ other X θ MMSE X + E E θ MMSE X Θ X = E θ other X θ MMSE X + E θ MMSE X Θ 7 E θ MMSE X Θ, 8 since the expectation of a nonnegative quantity is nonnegative. Example 3. Bernoulli distribution continued. In order to obtain point estimates for the parameter in Example we compute the posterior means: E Θ 1 X 1 = x = θf Θ1 X θ x dθ = θn θ n 0 dθ 30 β n 1 + 1, n = β n 1 +, n β n 1 + 1, n 0 + 1, 31 E Θ X 1 = x = θf Θ X θ x dθ 3 0 = β n 1 + 3, n β n 1 +, n Figure shows the posterior means for different values of n 0 and n 1. 7
8 3. Maximum-a-posteriori estimation An alternative to the posterior mean is the posterior mode, which is the maximum of the pdf or the pmf of the posterior distribution. Definition 3.3 Maximum-a-posteriori estimator. The maximum-a-posteriori MAP estimator of a parameter Θ given data x modeled as a realization of a random vector X is θ MAP x := arg max p Θ X x 34 if Θ is modeled as a discrete random variable and θ MAP x := arg max f Θ X x 35 if it is modeled as a continuous random variable. In Figure the ML estimator of Θ is the mode maximum value of the posterior distribution when the prior is uniform. This is not a coincidence, under a uniform prior the MAP and ML estimates are the same. Lemma 3.4. The maximum-likelihood estimator of a parameter Θ is the mode maximum value of the pdf of the posterior distribution given the data X if its prior distribution is uniform. Proof. We prove the result when the model for the data and the parameters is continuous, if any or both of them are discrete the proof is identical in that case the ML estimator is the mode of the pmf of the posterior. If the prior distribution of the parameters is uniform, then f Θ is constant for any θ, which implies arg max f Θ X x = arg max = arg max f X Θ x θ f Θ f X Θ x θ f 36 u Θ u f X Θ x u du the rest of the terms don t depend on θ = arg max L x. 37 8
9 Note that uniform priors are only well defined in situations where the parameter is restricted to a bounded set. We now describe a situation in which the MAP estimator is optimal. If the parameter Θ can only take a discrete set of values, then the MAP estimator minimizes the probability of making the wrong choice. Theorem 3.5 MAP estimator minimizes the probability of error. Let Θ be a discrete random vector and let X be a random vector modeling the data. We define For any arbitrary estimator θ other x, P θ other X Θ θ MAP x := arg max p Θ X X = x. 38 In words, the MAP estimator minimizes the probability of error. P θ MAP X Θ. 39 Proof. We assume that X is a continuous random vector, but the same argument applies if it is discrete. We have P Θ = θ other X = f X x P Θ = θ other x X = x d x 40 x = f X x p Θ X θ other x x d x 41 x f X x p Θ X θ MAP x x d x 4 x = P Θ = θ MAP X 43 4 follows from the definition of the MAP estimator as the mode of the posterior. Example 3.6 Sending bits. We consider a very simple model for a communication channel in which we aim to send a signal Θ consisting of a single bit. Our prior knowledge indicates that the signal is equal to one with probability 1/4. p Θ 1 = 1 4, p Θ 0 = Due to the presence of noise in the channel, we send the signal n times. At the receptor we observe X i = Θ + Z i, 1 i n, 45 9
10 where Z contains n iid standard Gaussian random variables. Modeling perturbations as Gaussian is a popular choice in communications. It is justified by the central limit theorem, under the assumption that the noise is a combination of many small effects that are approximately independent. We will now compute and compare the ML and MAP estimators of Θ given the observations. The likelihood is equal to L x θ = = n f Xi Θ x i θ 46 n 1 π e x i θ. 47 It is easier to deal with the log-likelihood function, x i θ log L x θ = n log π. 48 Since Θ only takes two values, we can compare directly. We will choose θ ML x = 1 if log L x 1 = x i x i + 1 n log π 49 x i n log π 50 = log L x Equivalently, θ ML x = { 1 if 1 n n x i > 1, 0 otherwise. 5 The rule makes a lot of sense: if the sample mean of the data is closer to 1 than to 0 then our estimate is equal to 1. By the law of total probability, the probability of error of this estimator is equal to P Θ θ ML X = P Θ θ ML X Θ = 0 P Θ = 0 + P Θ θ ML X Θ = 1 P Θ = 1 1 = P x i > 1 n Θ = 0 1 P Θ = 0 + P x i < 1 n Θ = 1 P Θ = 1 = Q n/, 53 10
11 where the last equality follows from the fact that if we condition on Θ = θ the empirical mean is Gaussian with variance σ /n and mean θ see the proof of Theorem 3. in Lecture Notes 6. To compute the MAP estimate we must find the maximum of the posterior distribution of Θ given the observed data. Equivalently, we find the maximum of the logarithm of the posterior pdf this is equivalent because the logarithm is a monotone function, n log p Θ X θ x = log f Xi Θ x i θ p Θ θ 54 f X x = log f Xi Θ x i θ p Θ θ log f X x 55 = x i x i θ + θ n log π + log p Θ θ log f X x. 56 We compare the value of this function for the two possible values of Θ: 0 and 1. We choose θ MAP x = 1 if x i x i + 1 log p Θ X 1 x + log f X x = n log π log 4 57 x i n log π log 4 + log 3 58 Equivalently, θ MAP x = = log p Θ X 0 x + log f X x. 59 { 1 if 1 n n x i > 1 + log 3, n 0 otherwise. The MAP estimate shifts the threshold with respect to the ML estimate to take into account that Θ is more prone to equal zero. However, the correction term tends to zero as we gather more evidence, so if a lot of data is available the two estimators will be very similar. The probability of error of the MAP estimator is equal to P Θ θ MAP x = P Θ θ MAP x Θ = 0 P Θ = 0 + P Θ θ MAP x Θ = 1 P Θ = 1 1 = P x i > 1 n + log 3 n Θ = 0 P Θ = P x i < 1 n + log 3 n Θ = 1 P Θ = 1 = 3 n/ 4 Q log n/ n 4 Q log 3. 6 n 60 11
12 ML estimator MAP estimator Probability of error n Figure 3: Probability of error of the ML and MAP estimators in Example 3.6 for different values of n. We compare the probability of error of the ML and MAP estimators in Figure 3. MAP estimation results in better performance, but the difference becomes small as n increases. 1
Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Bayesian statistics DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall15 Carlos Fernandez-Granda Frequentist vs Bayesian statistics In frequentist statistics
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationRandom variables. DS GA 1002 Probability and Statistics for Data Science.
Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationBayesian Methods. David S. Rosenberg. New York University. March 20, 2018
Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian
More informationQuantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing
Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More informationCS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationComputational Perception. Bayesian Inference
Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters
More informationBayesian Inference. STA 121: Regression Analysis Artin Armagan
Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y
More information(1) Introduction to Bayesian statistics
Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationLearning with Probabilities
Learning with Probabilities CS194-10 Fall 2011 Lecture 15 CS194-10 Fall 2011 Lecture 15 1 Outline Bayesian learning eliminates arbitrary loss functions and regularizers facilitates incorporation of prior
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationIntroduction to Bayesian Statistics
Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California
More informationCOMP 551 Applied Machine Learning Lecture 19: Bayesian Inference
COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationEstimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio
Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationCompute f(x θ)f(θ) dθ
Bayesian Updating: Continuous Priors 18.05 Spring 2014 b a Compute f(x θ)f(θ) dθ January 1, 2017 1 /26 Beta distribution Beta(a, b) has density (a + b 1)! f (θ) = θ a 1 (1 θ) b 1 (a 1)!(b 1)! http://mathlets.org/mathlets/beta-distribution/
More informationPoint Estimation. Vibhav Gogate The University of Texas at Dallas
Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationBayesian Inference. Introduction
Bayesian Inference Introduction The frequentist approach to inference holds that probabilities are intrinsicially tied (unsurprisingly) to frequencies. This interpretation is actually quite natural. What,
More informationMultivariate random variables
DS-GA 002 Lecture notes 3 Fall 206 Introduction Multivariate random variables Probabilistic models usually include multiple uncertain numerical quantities. In this section we develop tools to characterize
More informationCLASS NOTES Models, Algorithms and Data: Introduction to computing 2018
CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationLecture 20 Random Samples 0/ 13
0/ 13 One of the most important concepts in statistics is that of a random sample. The definition of a random sample is rather abstract. However it is critical to understand the idea behind the definition,
More informationA Brief Review of Probability, Bayesian Statistics, and Information Theory
A Brief Review of Probability, Bayesian Statistics, and Information Theory Brendan Frey Electrical and Computer Engineering University of Toronto frey@psi.toronto.edu http://www.psi.toronto.edu A system
More informationIntroduction: exponential family, conjugacy, and sufficiency (9/2/13)
STA56: Probabilistic machine learning Introduction: exponential family, conjugacy, and sufficiency 9/2/3 Lecturer: Barbara Engelhardt Scribes: Melissa Dalis, Abhinandan Nath, Abhishek Dubey, Xin Zhou Review
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28
More informationDensity Estimation: ML, MAP, Bayesian estimation
Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationA Discussion of the Bayesian Approach
A Discussion of the Bayesian Approach Reference: Chapter 10 of Theoretical Statistics, Cox and Hinkley, 1974 and Sujit Ghosh s lecture notes David Madigan Statistics The subject of statistics concerns
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationDS-GA 1002 Lecture notes 2 Fall Random variables
DS-GA 12 Lecture notes 2 Fall 216 1 Introduction Random variables Random variables are a fundamental tool in probabilistic modeling. They allow us to model numerical quantities that are uncertain: the
More informationBayesian Regression (1/31/13)
STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationPROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:
More informationOverview. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda
Overview DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationLecture 18: Learning probabilistic models
Lecture 8: Learning probabilistic models Roger Grosse Overview In the first half of the course, we introduced backpropagation, a technique we used to train neural nets to minimize a variety of cost functions.
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationDiscrete Distributions
Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationLecture 18: Bayesian Inference
Lecture 18: Bayesian Inference Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 18 Probability and Statistics, Spring 2014 1 / 10 Bayesian Statistical Inference Statiscal inference
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationEstimation of Quantiles
9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles
More informationExam 2 Practice Questions, 18.05, Spring 2014
Exam 2 Practice Questions, 18.05, Spring 2014 Note: This is a set of practice problems for exam 2. The actual exam will be much shorter. Within each section we ve arranged the problems roughly in order
More informationHypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006
Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)
More informationConfidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom
Confidence Intervals for the Mean of Non-normal Data Class 23, 8.05 Jeremy Orloff and Jonathan Bloom Learning Goals. Be able to derive the formula for conservative normal confidence intervals for the proportion
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are
More informationProbability Theory and Simulation Methods
Feb 28th, 2018 Lecture 10: Random variables Countdown to midterm (March 21st): 28 days Week 1 Chapter 1: Axioms of probability Week 2 Chapter 3: Conditional probability and independence Week 4 Chapters
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationApplied Bayesian Statistics STAT 388/488
STAT 388/488 Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago August 29, 207 Course Info STAT 388/488 http://math.luc.edu/~ebalderama/bayes 2 A motivating example (See
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationGeneral Bayesian Inference I
General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for
More informationLecture 6: Gaussian Mixture Models (GMM)
Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabás Póczos & Aarti Singh 2014 Spring Administration http://www.cs.cmu.edu/~aarti/class/10701_spring14/index.html Blackboard
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationSYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions
SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction
More informationClass 26: review for final exam 18.05, Spring 2014
Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event
More informationBayesian Analysis for Natural Language Processing Lecture 2
Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a
More informationDiscrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10
EECS 70 Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 Introduction to Basic Discrete Probability In the last note we considered the probabilistic experiment where we flipped
More informationBayesian Inference: Posterior Intervals
Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)
More informationEstimation techniques
Estimation techniques March 2, 2006 Contents 1 Problem Statement 2 2 Bayesian Estimation Techniques 2 2.1 Minimum Mean Squared Error (MMSE) estimation........................ 2 2.1.1 General formulation......................................
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationClassical and Bayesian inference
Classical and Bayesian inference AMS 132 January 18, 2018 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 18, 2018 1 / 9 Sampling from a Bernoulli Distribution Theorem (Beta-Bernoulli
More informationRobust Monte Carlo Methods for Sequential Planning and Decision Making
Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationIntroduction to Statistical Data Analysis Lecture 4: Sampling
Introduction to Statistical Data Analysis Lecture 4: Sampling James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis 1 / 30 Introduction
More informationStatistical Learning. Philipp Koehn. 10 November 2015
Statistical Learning Philipp Koehn 10 November 2015 Outline 1 Learning agents Inductive learning Decision tree learning Measuring learning performance Bayesian learning Maximum a posteriori and maximum
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More information