Computational Cognitive Science

Similar documents
Computational Cognitive Science

Computational Cognitive Science

Bayesian Concept Learning

Introduction to Bayesian Learning. Machine Learning Fall 2018

Outline. Limits of Bayesian classification Bayesian concept learning Probabilistic models for unsupervised and semi-supervised category learning

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Lecture : Probabilistic Machine Learning

Probability & statistics for linguists Class 2: more probability. D. Lassiter (h/t: R. Levy)

CS540 Machine learning L9 Bayesian statistics

MAT Mathematics in Today's World

Machine Learning

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Bayesian Models in Machine Learning

Computational Perception. Bayesian Inference

CSC321 Lecture 18: Learning Probabilistic Models

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Updating: Discrete Priors: Spring

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Formal Modeling in Cognitive Science

Formal Modeling in Cognitive Science Lecture 19: Application of Bayes Theorem; Discrete Random Variables; Distributions. Background.

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

CHAPTER 2 Estimating Probabilities

The Random Variable for Probabilities Chris Piech CS109, Stanford University

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

Joint, Conditional, & Marginal Probabilities

Bayes Nets: Sampling

Be able to define the following terms and answer basic questions about them:

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Lecture 1: Probability Fundamentals

Machine Learning

Introduction to Artificial Intelligence (AI)

Bayesian Methods for Machine Learning

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Intelligent Systems I

(1) Introduction to Bayesian statistics

Modeling Environment

CS340: Bayesian concept learning. Kevin Murphy Based on Josh Tenenbaum s PhD thesis (MIT BCS 1999)

CS 188: Artificial Intelligence Spring Today

CS 188: Artificial Intelligence. Bayes Nets

Frequentist Statistics and Hypothesis Testing Spring

Compute f(x θ)f(θ) dθ

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Discrete Structures for Computer Science

Bayesian Networks BY: MOHAMAD ALSABBAGH

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

CS 361: Probability & Statistics

Bayesian RL Seminar. Chris Mansley September 9, 2008

CS221 / Autumn 2017 / Liang & Ermon. Lecture 15: Bayesian networks III

UNIT 1: The Scientific Method Chapter 1: The Nature of Science (pages 5-35)

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

Conditional Probability, Independence and Bayes Theorem Class 3, Jeremy Orloff and Jonathan Bloom

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Bayesian concept learning

Learning with Probabilities

Bayesian Learning (II)

18.05 Practice Final Exam

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Interactive Chalkboard

The Naïve Bayes Classifier. Machine Learning Fall 2017

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Bayesian Inference: What, and Why?

CS 6140: Machine Learning Spring 2016

Bayesian probability theory and generative models

Uncertain Knowledge and Bayes Rule. George Konidaris

The Random Variable for Probabilities Chris Piech CS109, Stanford University

Bayesian Updating: Discrete Priors: Spring

Introduction to AI Learning Bayesian networks. Vibhav Gogate

Bayesian Updating with Discrete Priors Class 11, Jeremy Orloff and Jonathan Bloom

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

CS 343: Artificial Intelligence

PMR Learning as Inference

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY

Week 5: Logistic Regression & Neural Networks

Bayesian Inference and MCMC

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Generalisation and inductive reasoning. Computational Cognitive Science 2014 Dan Navarro

Probabilistic Graphical Models

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Probability Intro Part II: Bayes Rule

Classification & Information Theory Lecture #8

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #2 Scribe: Ellen Kim February 7, 2008

Bayesian Machine Learning

The PAC Learning Framework -II

COMP90051 Statistical Machine Learning

Probabilistic Models

7.1 What is it and why should we care?

Statistical learning. Chapter 20, Sections 1 4 1

Sampling Based Filters

Bayesian Statistics. State University of New York at Buffalo. From the SelectedWorks of Joseph Lucke. Joseph F. Lucke

Let us think of the situation as having a 50 sided fair die; any one number is equally likely to appear.

Chapter 2 Class Notes

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Transcription:

Computational Cognitive Science Lecture 9: A Bayesian model of concept learning Chris Lucas School of Informatics University of Edinburgh October 16, 218

Reading Rules and Similarity in Concept Learning by Tenenbaum (2; link)

Cognition as Inference The story of probabilistic cognitive modeling so far: models define probabilities that correspond to some aspect of human behavior; e.g., prob. of assigning category A to item i in the GCM beyond estimating parameters, we infer probability distributions from data, via conjugate priors point estimates Monte Carlo and other approximation methods

Cognition as Inference We have focused on using probabilities and probabilistic inference in data analysis, but the models haven t ascribed priors or probabilistic beliefs to people. Today we ll discuss a model where probabilities aren t just a useful tool, but rather have a cognitive status.

Cognition as probabilistic inference Many recent models assume that probabilities and estimation are cognitively real we estimate and represent something like probabilities. Why? people act as if they have degrees of belief or certainty humans must deal constantly with ambiguous and noisy information experimental evidence: People exploit and combine noisy information in an adaptive, graded way

Cognition as probabilistic inference People act as if they have degrees of belief or certainty. Example: Alice has a coin that might be two-headed. Alice flips the coin four times, it comes up HHHH. Consider the following bets: Would you take an even bet the coin will come up heads on the next flip? Would you bet 8 pounds against a profit of 1 pound? Would you bet your life against a profit of 1 pound?

Cognition as probabilistic inference Humans must deal constantly with ambiguous and noisy information, e.g., A common light-prior for visual search, shape, and reflectance judgments (Adams, 27)

Cognition as probabilistic inference Humans must deal constantly with ambiguous and noisy information, e.g., Infant Pulled from Wrecked Car Involved in Short Police Pursuit Language Log http://languagelog.ldc.upenn.edu/nll/?p=4441

Cognition as probabilistic inference People exploit and combine noisy information in an adaptive, graded way, e.g., Estimating motor forces and visual patterns from noisy data Combining visual and motor feedback Learning about cause and effect in unreliable systems Learning about the traits, beliefs and desires of others from their actions Language learning

Cognition as probabilistic inference How do people represent and exploit information about probabilities? Intuitively: our inferences depend on observations, but also on prior beliefs; as more observations accrue, estimates become more reliable; when observations are unreliable, prior beliefs are used instead. Today we will discuss a model built on these intuitions.

Concepts vs. Categories Tenenbaum (2) addresses the question of how people quickly learn new concepts: concepts can be concrete categories (dog, chair), or more vague ( healthy level for a specific hormone, ripe for a pear); here, we will focus on number concepts ( odd number, between 3 and 45 ). Generalization is a key feature of concept learning: given a small number of positive examples, determine which other examples are also members of the concept.

Generalization Given some examples of a concept, determine which other things belong to that concept. Two basic strategies: rule-based generalization: find a rule that describes the examples and apply it: deterministic predictions; similarity-based generalization: identify features of the examples and the new item, and decide based on how many features are shared: probabilistic predictions. People s judgments are consistent with both strategies, but in different circumstances.

Generalization Tenenbaum presents a Bayesian model of concept learning: the model can exhibit both rule-based and similarity-based behavior; but it is not a hybrid model: it uses only one mechanism, rules and similarity are special cases; explains how people can generalize from very few examples; Bayesian hypothesis averaging is a key feature of the model. The model is trained on data from number concept learning.

The Number Game I think of a number concept (a subset of numbers 1 1). odd numbers; powers of two; numbers between 23 and 34. I choose some examples of this concept at random and show them to you: {3, 57}; {16, 2, 8}; {25, 31, 24}. You guess what other numbers are also included in the concept.

Experimental Design Subjects are told how the game works. Then, a few examples of the concept are presented: class I trials: only one example; class II trials: four examples, consistent with a simple mathematical rule; class III trials: four examples, similar in magnitude. Subjects then rate the probability that other numbers (randomly chosen from 1 1) are also part of the concept.

Class I Trials Only one example is given (16 or 6). Results:.5 1 X = 16.5 1 X = 6 responses fairly uniform, but slightly higher ratings for similar magnitude, similar mathematically; even numbers (both), powers of two (16), multiples of ten (6). Notes: stars show examples given; missing bars are not zero, just were not queried.

Class II Trials Four examples were given, consistent with a simple mathematical rule ({16, 8, 2, 64} or {6, 8, 1, 3}). Results:.5 1 X = 16 8 2 64.5 1 X = 6 8 1 3 responses reflect most specific rule consistent with examples, other numbers have a probability near zero; these rules are not the only logical possibility: {16, 8, 2, 64} could be even numbers, for example.

Class III Trials Four examples were given that didn t follow a simple rule, but were similar in magnitude ({16, 23, 19, 2} or {6, 52, 57, 55}). Results:.5 1 X = 16 23 19 2 1 2 3 4 5 6 7 8 9 1.5 1 X = 6 52 57 55 1 2 3 4 5 6 7 8 9 1 responses reflect similarity gradient by magnitude; low probability for number more than a fixed distance away from the largest or smallest example.

Bayesian Model Given data X = {x (1),..., x (n) } sampled from concept C, we want to determine P(y C X) for new data point y. As in many inference problems, a hidden variable (C) determines the inference, but we don t know C, so we will average over it: P(y C X) = h H P(y C C = h)p(c = h X) To compute the posterior P(C = h X), we need to decide: What is the hypothesis space H? What is the prior distribution over hypotheses? What is the likelihood function?

Hypothesis Space In theory, all possible subsets of numbers 1 1. The full space too large; we consider only salient subsets: subsets defined by mathematical properties: odds, evens, primes, squares, cubes, multiples and powers of small numbers, numbers with same final digit; subsets defined by similar magnitude: intervals of consecutive numbers. Total: 583 hypotheses.

Prior P(C = h) First, assign a probability to each type of hypothesis: P(C is defined mathematically) = λ; P(C is defined as an interval) = 1 λ. Use λ = 1 2. Then assign probabilities within these types: all mathematical hypotheses are equally probable; medium-sized intervals are more probable than small or large intervals. Via Erlang distribution: P(h) ( h /σ 2 )e h /σ.

Likelihood P(X C = h) Assume examples are sampled uniformly at random from C. For hypothesis h containing h numbers, each number in h is drawn as an example with probability 1/ h, so: P(X = x (1)... x (n) h) = { 1 h n if j, x (j) h otherwise Ex. For h = multiples of five, h = 2, P(1, 35 h) = 1/2 2. Size principle: for fixed data, smaller hypotheses have higher likelihood than larger hypotheses. As data increases, smaller hyps have exponentially higher likelihood than larger hypotheses.

Inference over Posterior Draw inferences by averaging over hypotheses: P(y C X) = h H P(y C C = h)p(c = h X) P(y C C = h) is either or 1. The posterior P(C = h X) is computed using Bayes rule, with likelihood and prior as defined above: P(C = h X) = P(X C = h)p(c = h) P(X)

Alternative Models Similarity model (SIM): Ignore size principle, just look at consistency with hypotheses. Posterior doesn t sharpen as sample grows Effectively a /1 likelihood: P(X = x (1)... x (n) h) = { 1 if j, x (j) h otherwise

Alternative Models Rule-based model (MIN): replaces hypothesis averaging with MAP estimate: always choose the highest probability hypothesis; since priors are weak, guided by likelihood: always selects the smallest (most specific) consistent rule (size principle); reasonable when this rule (hypothesis) is much more probable than all others (Class II); not reasonable when many hypotheses have similar probabilities (Class I and III).

Results Humans:.5 1 X = 16.5 1 X = 16 8 2 64.5 1 X = 16 23 19 2 1 2 3 4 5 6 7 8 9 1 Similarity-based model: (c) Pure similarity model (SIM):.5 1 X = 16.5 1 X = 16 8 2 64.5 1 X = 16 23 19 2 1 2 3 4 5 6 7 8 9 1 (d) Pure rule model (MIN):

Results Humans:.5 1 X = 16.5 1 X = 16 8 2 64.5 1 X = 16 23 19 2 1 2 3 4 5 6 7 8 9 1 Rule-based model:.5 1 X = 16.5 1 X = 16 8 2 64.5 1 X = 16 23 19 2 1 2 3 4 5 6 7 8 9 1 ion judgments:

Results Humans:.5 1 X = 16.5 1 X = 16 8 2 64.5 1 X = 16 23 19 2 1 2 3 4 5 6 7 8 9 1 Bayesian model: (b) Bayesian model:.5 1 X = 16.5 1 X = 16 8 2 64.5 1 X = 16 23 19 2 1 2 3 4 5 6 7 8 9 1 model (SIM):

Conclusions Previous work suggested two different mechanisms for concept learning: rules or similarity; no explanation for why one of them is used in any given case; Bayesian model suggests these are two special cases of a single system implementing Bayesian inference; this results stems from an interaction between: hypothesis averaging: yields similarity-like behavior when many hypotheses have similar probability; size principle: yields rule-like behavior when one hypothesis is much more probable than others; though still possible these could be implemented in the brain with two different mechanisms.