Computational Cognitive Science
|
|
- Stanley Hudson
- 5 years ago
- Views:
Transcription
1 Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational Cognitive Science 1
2 1 Background Cognition as Inference Probability Distributions 2 Bayes Rule Comparing Infinitely Many Hypotheses 3 Maximum Likelihood Estimation Maximum a Posteriori Estimation Bayesian Integration 4 Choosing a Prior Conjugate Priors Reading: Griffiths and Yuille (2006). Frank Keller Computational Cognitive Science 2
3 Cognition as Inference Cognition as Inference Probability Distributions The story of probabilistic cognitive modeling so far: models define probabilities that correspond to some aspect of human behavior; example: P(R i = A i), the probability of assigning category A to item i in the GCM; models have parameters that determine these probability distributions (e.g., scaling factor c in the CGM); maximum likelihood estimation is a way of setting these parameters: we infer probability distributions from data. So are probabilities and parameter estimators just technical devices? Or do they have a cognitive status in our model? Frank Keller Computational Cognitive Science 3
4 Cognition as Inference Cognition as Inference Probability Distributions The recent literature assumes that probabilities and estimation are cognitively real. The intuitions behind this are: probabilities reflect degrees of belief; humans make observations from which they infer the probabilities on which their behavior is based; so humans also use estimation techniques! Frank Keller Computational Cognitive Science 4
5 Cognition as Inference Cognition as Inference Probability Distributions The recent literature assumes that probabilities and estimation are cognitively real. The intuitions behind this are: probabilities reflect degrees of belief; humans make observations from which they infer the probabilities on which their behavior is based; so humans also use estimation techniques! But which ones? Maximum likelihood estimation? Intuitively, inference is cognitively plausible if: estimates depend on observations, but also on prior beliefs; as more observations accrue, estimates become more reliable; when observations are unreliable, prior beliefs are used instead. Today we will discuss the mathematics behind these intuitions. Frank Keller Computational Cognitive Science 4
6 Distributions Background Cognition as Inference Probability Distributions Let s recap the distinction between discrete and continuous distributions. Discrete distributions: sample space S is finite or countably infinite (e.g., integers); distribution is a probability mass function, defines probability of a random variable taking on a particular value; example: P(x) = ( n x) θ x (1 θ) n x (binomial distribution): b(x; 12, 0.5) x Frank Keller Computational Cognitive Science 5
7 Distributions Background Cognition as Inference Probability Distributions We have also seen examples of continuous distributions: sample space is uncountably infinite (real numbers); distribution is a probability density function, defines the probabilities if intervals of the random variable; example: P(x) = 1 θ e x/θ (exponential distribution): Frank Keller Computational Cognitive Science 6
8 Discrete vs. Continuous Cognition as Inference Probability Distributions Discrete distributions: P(X = x) 0 for all x S x S P(x) = 1 P(Y ) = X i P(Y X i )P(X i ) E[X ] = x x P(X = x) Expectation Law of Total Probability Frank Keller Computational Cognitive Science 7
9 Discrete vs. Continuous Cognition as Inference Probability Distributions Discrete distributions: P(X = x) 0 for all x S x S P(x) = 1 P(Y ) = X i P(Y X i )P(X i ) E[X ] = x x P(X = x) Expectation Law of Total Probability Continuous distributions: P(x) 0 for all x R P(x)dx = 1 P(y) = P(y x)p(x)dx E[X ] = x x P(x)dx Law of Total Probability Expectation Frank Keller Computational Cognitive Science 7
10 Bayes Rule Background Bayes Rule Comparing Infinitely Many Hypotheses In its general form, the inference task consists of determining the probability of a hypothesis given some data. Notation: h: the hypothesis we are interested in; H: the hypothesis space (set of all possible hypotheses); y: observed data (note we use y rather than d); According to Bayes rule: P(h y) = P(y h)p(h) P(y) Frank Keller Computational Cognitive Science 8
11 Bayes Rule Background Bayes Rule Comparing Infinitely Many Hypotheses In its general form, the inference task consists of determining the probability of a hypothesis given some data. Notation: h: the hypothesis we are interested in; H: the hypothesis space (set of all possible hypotheses); y: observed data (note we use y rather than d); According to Bayes rule: P(h y) = P(y h)p(h) P(y) likelihood Frank Keller Computational Cognitive Science 8
12 Bayes Rule Background Bayes Rule Comparing Infinitely Many Hypotheses In its general form, the inference task consists of determining the probability of a hypothesis given some data. Notation: h: the hypothesis we are interested in; H: the hypothesis space (set of all possible hypotheses); y: observed data (note we use y rather than d); According to Bayes rule: P(h y) = P(y h)p(h) P(y) prior Frank Keller Computational Cognitive Science 8
13 Bayes Rule Background Bayes Rule Comparing Infinitely Many Hypotheses In its general form, the inference task consists of determining the probability of a hypothesis given some data. Notation: h: the hypothesis we are interested in; H: the hypothesis space (set of all possible hypotheses); y: observed data (note we use y rather than d); According to Bayes rule: P(h y) = P(y h)p(h) P(y) posterior Frank Keller Computational Cognitive Science 8
14 Bayes Rule Background Bayes Rule Comparing Infinitely Many Hypotheses In its general form, the inference task consists of determining the probability of a hypothesis given some data. Notation: h: the hypothesis we are interested in; H: the hypothesis space (set of all possible hypotheses); y: observed data (note we use y rather than d); According to Bayes rule: P(h y) = P(y h)p(h) P(y) We can compute the denominator using the law of total probability: P(y) = h H P(y h )P(h ) Frank Keller Computational Cognitive Science 8
15 Bayes Rule Comparing Infinitely Many Hypotheses Example: a box contains two coins, one that comes up heads 50% of the time, and one that comes up heads 90% of the time. You pick one of the coins, flip it 10 times and observe HHHHHHHHHH. Which coin was flipped? What if you had observed HHTHTHTTHT? Frank Keller Computational Cognitive Science 9
16 Bayes Rule Comparing Infinitely Many Hypotheses Example: a box contains two coins, one that comes up heads 50% of the time, and one that comes up heads 90% of the time. You pick one of the coins, flip it 10 times and observe HHHHHHHHHH. Which coin was flipped? What if you had observed HHTHTHTTHT? Let θ be the probability that the coin comes up heads. So we have two hypotheses: h 0 : θ = 0.5 and h 1 : θ = 0.9. The probability of a sequence y with N H heads and N T tails is: P(y θ) = θ N H (1 θ) N T This is a Bernoulli distribution (special case of the Binomial dist.). Frank Keller Computational Cognitive Science 9
17 Bayes Rule Comparing Infinitely Many Hypotheses We can compare the probabilities of the two hypotheses directly by computing the odds: P(h 1 y) P(h 0 y) = P(y h 1) P(h 1 ) P(y h 0 ) P(h 0 ) Frank Keller Computational Cognitive Science 10
18 Bayes Rule Comparing Infinitely Many Hypotheses We can compare the probabilities of the two hypotheses directly by computing the odds: P(h 1 y) P(h 0 y) = P(y h 1) P(h 1 ) P(y h 0 ) P(h 0 ) likelihood ratio Frank Keller Computational Cognitive Science 10
19 Bayes Rule Comparing Infinitely Many Hypotheses We can compare the probabilities of the two hypotheses directly by computing the odds: P(h 1 y) P(h 0 y) = P(y h 1) P(h 1 ) P(y h 0 ) P(h 0 ) prior odds Frank Keller Computational Cognitive Science 10
20 Bayes Rule Comparing Infinitely Many Hypotheses We can compare the probabilities of the two hypotheses directly by computing the odds: P(h 1 y) P(h 0 y) = P(y h 1) P(h 1 ) P(y h 0 ) P(h 0 ) posterior odds Frank Keller Computational Cognitive Science 10
21 Bayes Rule Comparing Infinitely Many Hypotheses We can compare the probabilities of the two hypotheses directly by computing the odds: P(h 1 y) P(h 0 y) = P(y h 1) P(h 1 ) P(y h 0 ) P(h 0 ) We get posterior odds of 357:1 in favor of h 1 for HHHHHHHHHH and 165:1 in favor of h 0 for HHTHTHTTHT. Frank Keller Computational Cognitive Science 10
22 Comparing Infinitely Many Hypotheses Bayes Rule Comparing Infinitely Many Hypotheses Let s now assume that θ, the probability of the coin coming up heads, can be anywhere between 0 and 1. Now we have infinitely many hypotheses, but Bayes rule still applies: P(θ y) = P(y θ)p(θ) P(y) where the probability of the data is: P(y) = 1 0 P(y θ)p(θ)dθ But how do we compute θ? There are three options. Frank Keller Computational Cognitive Science 11
23 Maximum Likelihood Estimation Maximum Likelihood Estimation Maximum a Posteriori Estimation Bayesian Integration 1. Choose the θ that makes y most probable, i.e., ignore P(θ): ˆθ = argmax P(y θ) θ This is the maximum likelihood (ML) estimate of θ. Problem: The ML estimate often does not generalize well (it overfits the data). It is a point estimate, and hence fails to take the shape of the posterior distribution into account. Frank Keller Computational Cognitive Science 12
24 Maximum a Posteriori Estimation Maximum Likelihood Estimation Maximum a Posteriori Estimation Bayesian Integration 2. Choose the θ that is most probable given y: ˆθ = argmax θ P(θ y) = argmax P(y θ)p(θ) θ This is the maximum a posteriori (MAP) estimate of θ, and is equivalent to the ML estimate when P(θ) is uniform. Non-uniform priors can reduce overfitting, but the MAP still doesn t account for the shape of P(θ y): Frank Keller Computational Cognitive Science 13
25 Bayesian Integration Background Maximum Likelihood Estimation Maximum a Posteriori Estimation Bayesian Integration 3. Instead of maximizing, take the expected value of θ: E[θ] = 1 0 θp(θ y)dθ = 1 0 θ P(y θ)p(θ) dθ P(y) 1 0 θp(y θ)p(θ)dθ This is the posterior mean, the average over all hypotheses. For our coin flip example, the posterior is: P(θ y) = (N H + N T + 1)! θ N H (1 θ) N T N H!N T! This is known as the Beta distribution. Frank Keller Computational Cognitive Science 14
26 Bayesian Integration Background Maximum Likelihood Estimation Maximum a Posteriori Estimation Bayesian Integration 3. Instead of maximizing, take the expected value of θ: E[θ] = 1 0 θp(θ y)dθ = 1 0 θ P(y θ)p(θ) dθ P(y) 1 0 θp(y θ)p(θ)dθ This is the posterior mean, the average over all hypotheses. For our coin flip example, the posterior is: P(θ y) = (N H + N T + 1)! θ N H (1 θ) N T = Beta(N H + 1, N T + 1) N H!N T! This is known as the Beta distribution. Frank Keller Computational Cognitive Science 14
27 Beta Distribution Background Maximum Likelihood Estimation Maximum a Posteriori Estimation Bayesian Integration Frank Keller Computational Cognitive Science 15
28 Maximum Likelihood Estimate Maximum Likelihood Estimation Maximum a Posteriori Estimation Bayesian Integration Using the Beta distribution, the ML estimate (equivalent to the MAP estimate with a uniform prior) works out as: ˆθ = N H N H + N T This is a relative frequency estimate: it s simply the frequency of heads over the total number of coin flips. This estimate is insensitive to sample size: if we get 10 heads and 0 tails then we are as certain about θ as if we get 100 heads and 0 tails. This explains the overfitting. Frank Keller Computational Cognitive Science 16
29 Posterior Mean Background Maximum Likelihood Estimation Maximum a Posteriori Estimation Bayesian Integration Let s compare this with the posterior mean, which for the Beta distribution works out as: E[θ] = N H + 1 N H + N T + 2 This is the average over all values of θ. It pays attention to sample size (compare E[θ] for 10 heads and 0 tails vs. 100 heads and 0 tails), and is less prone to overfitting. We can think of this as adding pseudocounts to the relative frequency estimate. This is called smoothing. Note that we are still assuming a uniform prior! Frank Keller Computational Cognitive Science 17
30 Choosing a Prior Background Choosing a Prior Conjugate Priors Let s assume we want to use a non-uniform prior. We could again use the Beta distribution: P(θ) = Beta(V H + 1, V T + 1) where V H, V T > 1 encodes our belief about likely values of θ. This distribution has a mean of (V H + 1)/(V H + VT + 2) and becomes concentrated around the mean as V H + V T increases. For example, V H = V T = 1000 puts a strong prior on θ = 0.5. The parameters that govern the prior distribution are called hyperparameters. (Here, V H and V T are hyperparameters.) Frank Keller Computational Cognitive Science 18
31 Choosing a Prior Background Choosing a Prior Conjugate Priors Using the Beta(V H + 1, V T + 1) prior, the posterior distribution becomes: P(θ y) = (N H + N T + V H + V T + 1)! θ N H+V H (1 θ) N T +V T (N H + V H )!(N T + V T )! which is Beta(N H + V H + 1, N T + V T + 1). The MAP estimate of this posterior is then: and the posterior mean becomes: N H + V H ˆθ = N H + N T + V H + V T E[θ] = N H + V H + 1 N H + N T + V H + V T + 2 Frank Keller Computational Cognitive Science 19
32 Choosing a Prior Background Choosing a Prior Conjugate Priors Returning to our example, if we use a Beta-prior with V H = V T = 1000, and our data consists of a sequence of 10 heads and 0 tails, then: E[θ] = N H + V H + 1 N H + N T + V H + V T + 2 = So we retain our belief that θ = 0.5, even though we ve seen strong evidence to the contrary. This would change had we seen 100 heads rather than 10. Compare this to the maximum likelihood estimate, which is: ˆθ = N H N H + N T = 1 Frank Keller Computational Cognitive Science 20
33 Conjugate Priors Background Choosing a Prior Conjugate Priors The likelihood was Bernoulli distributed, and the prior Beta distributed. This ensured the posterior was also Beta distributed. This is because the Bernoulli and the Beta distribution are conjugate distribution. Using a conjugate prior can make the computation of the posterior tractable (e.g., by ensuring that there is an analytic solution). Likelihood: Bernoulli Conjugate Prior: Beta Binomial Beta Multinomial Dirichlet Normal Normal Frank Keller Computational Cognitive Science 21
34 Summary Background Choosing a Prior Conjugate Priors Cognitive tasks can be modeled as probabilistic inference; using Bayes rule, inference can be broken down into posterior, likelihood, and prior distributions; standard techniques such as maximum likelihood estimation or MAP generate point estimates of the parameters; Bayesian techniques instead use averaging (Bayesian integration) over all parameter values; this makes them less prone to overfitting and allows the use of informative priors; the prior distribution is typically chosen to be conjugate with the likelihood distribution. Frank Keller Computational Cognitive Science 22
35 References Background Choosing a Prior Conjugate Priors Griffiths, Tom L. and Alan Yuille A primer on probabilistic inference. Trends in Cognitive Sciences 10(7). Frank Keller Computational Cognitive Science 23
Computational Cognitive Science
Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: A Bayesian model of concept learning Chris Lucas School of Informatics University of Edinburgh October 16, 218 Reading Rules and Similarity in Concept Learning
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationComputational Perception. Bayesian Inference
Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters
More informationCS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationIntroduction to Machine Learning
Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationBayesian Methods. David S. Rosenberg. New York University. March 20, 2018
Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationIntroduction into Bayesian statistics
Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationModeling Environment
Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA
More informationBayesian Inference. STA 121: Regression Analysis Artin Armagan
Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y
More informationEstimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio
Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist
More informationCS540 Machine learning L9 Bayesian statistics
CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More information(1) Introduction to Bayesian statistics
Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they
More informationIntroduc)on to Bayesian methods (con)nued) - Lecture 16
Introduc)on to Bayesian methods (con)nued) - Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of
More informationLecture 2: Conjugate priors
(Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability
More informationLecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.
Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationA.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace
A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical
More informationBayesian hypothesis testing
Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State University February 21, 2018 Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing February 21, 2018 1 / 25 Outline Scientific method Statistical
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationPoint Estimation. Vibhav Gogate The University of Texas at Dallas
Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables
More informationNaïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266
More informationPROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationReadings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008
Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 22, 2011 Today: MLE and MAP Bayes Classifiers Naïve Bayes Readings: Mitchell: Naïve Bayes and Logistic
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 5 Bayesian Learning of Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Recitations: Every Tuesday 4-5:30 in 243 Annenberg Homework 1 out. Due in class
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationMACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION
MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be
More informationBayesian Analysis for Natural Language Processing Lecture 2
Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationLecture 11: Probability Distributions and Parameter Estimation
Intelligent Data Analysis and Probabilistic Inference Lecture 11: Probability Distributions and Parameter Estimation Recommended reading: Bishop: Chapters 1.2, 2.1 2.3.4, Appendix B Duncan Gillies and
More informationBayesian Analysis (Optional)
Bayesian Analysis (Optional) 1 2 Big Picture There are two ways to conduct statistical inference 1. Classical method (frequentist), which postulates (a) Probability refers to limiting relative frequencies
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationFormal Modeling in Cognitive Science
Formal Modeling in Cognitive Science Lecture 9: Application of Bayes Theorem; Discrete Random Variables; Steve Renals (notes by Frank Keller) School of Informatics University of Edinburgh s.renals@ed.ac.uk
More informationECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu
ECE521 W17 Tutorial 6 Min Bai and Yuhuai (Tony) Wu Agenda knn and PCA Bayesian Inference k-means Technique for clustering Unsupervised pattern and grouping discovery Class prediction Outlier detection
More informationFormal Modeling in Cognitive Science Lecture 19: Application of Bayes Theorem; Discrete Random Variables; Distributions. Background.
Formal Modeling in Cognitive Science Lecture 9: ; Discrete Random Variables; Steve Renals (notes by Frank Keller) School of Informatics University of Edinburgh s.renals@ed.ac.uk February 7 Probability
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationCOMP 551 Applied Machine Learning Lecture 19: Bayesian Inference
COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationIntroduction to Bayesian Statistics
School of Computing & Communication, UTS January, 207 Random variables Pre-university: A number is just a fixed value. When we talk about probabilities: When X is a continuous random variable, it has a
More informationConjugate Priors, Uninformative Priors
Conjugate Priors, Uninformative Priors Nasim Zolaktaf UBC Machine Learning Reading Group January 2016 Outline Exponential Families Conjugacy Conjugate priors Mixture of conjugate prior Uninformative priors
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationCompute f(x θ)f(θ) dθ
Bayesian Updating: Continuous Priors 18.05 Spring 2014 b a Compute f(x θ)f(θ) dθ January 1, 2017 1 /26 Beta distribution Beta(a, b) has density (a + b 1)! f (θ) = θ a 1 (1 θ) b 1 (a 1)!(b 1)! http://mathlets.org/mathlets/beta-distribution/
More informationGentle Introduction to Infinite Gaussian Mixture Modeling
Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationLecture 13 Fundamentals of Bayesian Inference
Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationDecision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over
Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationBayesian Inference. Introduction
Bayesian Inference Introduction The frequentist approach to inference holds that probabilities are intrinsicially tied (unsurprisingly) to frequencies. This interpretation is actually quite natural. What,
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationOverview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More information