A latent variable model of configural conditioning

Similar documents
Reinforcement learning

STA 4273H: Statistical Machine Learning

Outline. Limits of Bayesian classification Bayesian concept learning Probabilistic models for unsupervised and semi-supervised category learning

Bayesian Learning in Undirected Graphical Models

Every animal is represented by a blue circle. Correlation was measured by Spearman s rank correlation coefficient (ρ).

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Density Estimation. Seungjin Choi

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Structure learning in human causal induction

Augmented Rescorla-Wagner and Maximum Likelihood Estimation.

Sequential Causal Learning in Humans and Rats

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Lecture 16 Deep Neural Generative Models

Bayesian Learning in Undirected Graphical Models

Ch 4. Linear Models for Classification

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Latent state estimation using control theory

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Chapter 20. Deep Generative Models

MODULE -4 BAYEIAN LEARNING

Probabilistic Graphical Models

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

Will Penny. SPM short course for M/EEG, London 2015

Modeling Causal Generalization with Particle Filters

Introduction to Gaussian Process

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Bayesian Analysis. Bayesian Analysis: Bayesian methods concern one s belief about θ. [Current Belief (Posterior)] (Prior Belief) x (Data) Outline

Mathematical Formulation of Our Example

STA 4273H: Sta-s-cal Machine Learning

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Variational Inference via Stochastic Backpropagation

Approximate Inference Part 1 of 2

Unsupervised Learning

Gaussian Process Approximations of Stochastic Differential Equations

Gaussian Processes in Machine Learning

Variational Bayesian Logistic Regression

Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge

Logistic Regression. Machine Learning Fall 2018

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

The Laplace Approximation

Interpretable Latent Variable Models

Deep unsupervised learning

Introduction to Machine Learning. Lecture 2

Dynamic Causal Models

Naïve Bayes classification

A Bayesian Approach to Phylogenetics

Linear Classification

Logistic Regression & Neural Networks

The Diffusion Model of Speeded Choice, from a Rational Perspective

Multivariate Bayesian Linear Regression MLAI Lecture 11

Bayesian Models in Machine Learning

Bayesian Concept Learning

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Learning Parameters of Undirected Models. Sargur Srihari

Introduction to Machine Learning

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Using Graphs to Describe Model Structure. Sargur N. Srihari

CSC321 Lecture 18: Learning Probabilistic Models

Modeling human function learning with Gaussian processes

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Learning Gaussian Process Models from Uncertain Data

Machine Learning - Waseda University Logistic Regression

Human Pose Tracking I: Basics. David Fleet University of Toronto

Pattern Recognition and Machine Learning

p L yi z n m x N n xi

Semi-rational Models of Conditioning: The Case of Trial Order

Lecture 3: Pattern Classification

Online Bayesian Passive-Agressive Learning

Outline Lecture 2 2(32)

COMP90051 Statistical Machine Learning

CS540 Machine learning L9 Bayesian statistics

Parameter learning in CRF s

Machine Learning Techniques for Computer Vision

Linear Dynamical Systems

Posterior Regularization

STA 4273H: Statistical Machine Learning

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

The Origin of Deep Learning. Lili Mou Jan, 2015

Chapter 16. Structured Probabilistic Models for Deep Learning

Bayesian model selection: methodology, computation and applications

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Approximate Inference Part 1 of 2

Introduction to Gaussian Processes

Unsupervised Learning

GAUSSIAN PROCESS REGRESSION

Collapsed Variational Inference for Sum-Product Networks

The Bayes classifier

Study Notes on the Latent Dirichlet Allocation

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Evaluating the Variance of

Mathematics I. Exercises with solutions. 1 Linear Algebra. Vectors and Matrices Let , C = , B = A = Determine the following matrices:

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Transcription:

A latent variable model of configural conditioning Aaron C. Courville obotics Institute, CMU Work with: Nathaniel D. Daw, UCL (Gatsby), David S. Touretzky, CMU and Geoff Gordon, CMU

Similarity & Discrimination in Animal Learning Similarity: How do animals respond to novel patterns of stimuli? Discrimination: How do animals learn to discriminate between overlapping patterns of stimuli? We recognize these issues as tradeoff between generalization and data-fitting 2

3

Perspectives on modeling conditioning Discriminative Bell Generative latent state Light Food Tone Bell Light Tone Food 4

Models of Conditioning: escorla-wagner (1972) Predicts reinforcement intensity as a linear function of stimuli, X = [A (light), B (bell),...]. V = i w i X i Learning rule is gradient descent on prediction error w i = α i β(r V )X i 5

Stimulus configurations Configural conditioning: discrimination and generalization between patterns of stimuli. Training: (XO) A+ B+ AB- 120 1 escorla-wagner: esponses (per min.) 100 80 60 40 A / B AB esponse Strength 0.8 0.6 0.4 0.2 A / B AB 20 0 2 4 6 8 Trial Blocks 6 0 0 2 4 6 8 10 Trial Blocks

Modeling Configurations Two dominant perspectives: 1. Added elements W, [W&, 1972] 2. Configural model [Pearce, 1994] Augment stimulus representation with configural unit. eg. XO: X=[A,B,AB]. Which units are active? 7 Observe AB [W& 1972]: All units present. X=[A=1,B=1,AB=1] [Pearce 1994]: Graded activation by generalization rule. X=[A=.5,B=.5,AB=1]

Expt. 1 Paired Compounds [escorla, 2003] Training Trials: AB+ CD+ Test Trials: Trained: AB, CD Transfer: AD, BC Elements: A, B, C, D esponses (per min.) 250 200 150 100 50 0 Trained Transfer Element Probe Stimulus 8

Modeling Paired Compounds esponses (per min.) 250 200 150 100 50 0 Training: AB+ CD+ Trained Transfer Element Probe Stimulus escorla-wagner: esponse Strength 1 0.8 0.6 0.4 0.2 0 Pearce: esponse Strength 1 0.8 0.6 0.4 0.2 Trained Transfer Element Probe Stimulus 9 0 Trained Transfer Element Probe Stimulus

Expt. 2 Asymmetric XO [edhead & Pearce, 1995] 180 Training Trials: A+ BC+ ABC- Test Trials: A BC ABC esponses (per min.) 160 140 120 100 80 60 A BC ABC 40 0 2 4 6 8 10 Trial Blocks 10

Modeling Asymmetric XO escorla-wagner: 1 esponses (per min.) Training: A+ BC+ ABC- 180 160 140 120 100 80 60 40 0 2 4 6 8 10 Trial Blocks A BC ABC 11 esponse Strength 0.8 0.6 0.4 0.2 Pearce: esponse Strength A+ BC+ ABC- 0 0 20 40 60 80 100 Trial Blocks 1 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 Trial Blocks A+ BC+ ABC-

Issues in Modeling Configural Conditioning How do we choose between the two models? Similarity: How to measure similarity between patterns of stimuli? Discrimination: How do we choose a representation that is flexible enough? A formal Bayesian approach can guide us 12

Perspectives on modeling conditioning Discriminative P( A,B,C,D) Generative P(,A,B,C,D) A B C D AB CD x 1 x 2 A B C D 13

A latent variable model Generative model: sigmoid belief network. P (S i x) = (1 + exp( w i x)) 1 x 1 x 2 Stimuli and Latent variables are binary (on = 1, off = 0) A B C D Latent variables correlate stimuli configural unit 14

Model Inference Learning: P(w,m D) Prediction: P( Stim,D) x 1 x 2 A B C D 15

Learning in the L.V. model Learning = Bayesian inference over weights & model structure conditional on training data P (w m, m D) P (D w m, m)p (w m.m) x 1 x 2 A B C D Latent variable is unknown and unwanted so we compute the marginal likelihood: P (D w m, m) = P (S t,i x, w m, m)p (x w m, m) t x i 16

Approximate inference Inference is analytically intractable: use reversible-jump MCMC reversible-jump mixes slowly: Exchange MCMC method to help x x x x x x x x x 17

L.V. model priors Prior over number of latent variables: 0.1 0.08 Geometric(0.1) Prior over weight magnitudes: Laplace(2.0) 0.2 0.06 0.04 0.1 0.02 0 0 10 20 30 40 50 num. latent variables 18-10 -5 0 5 10 weight Additional assumption: Stimuli are a priori rare

Prediction x 1 Generalization => inference over latents A B P ( A, B, m, w m ) = x 1 P ( x 1, m, w m )P (x A, B, m, w m ) P (x 1 A, B, m, w m ) P (A x 1, m, w m )P ( B x 1, m, w m )P (x 1 m, w m ) Posterior reinforcement prediction: marginalize over choice of weights and model structure. P ( Stim, m, D) = P ( Stim, m, w m, D)P (w m m, D) dw w P ( Stim, D) = m P ( Stim, m, D)P (m D) 19

L.V. Model of Paired Compounds x 1 x 2 MAP Model Structure: 250 Training: AB+ CD+ 1 A B C D esponses (per min.) 200 150 100 50 P( Test, D ) 0.8 0.6 0.4 0.2 0 Trained Transfer Element Probe Stimulus 0 Trained Transfer Element Test Stimulus 20

L.V. Model of Asymmetric XO MAP Model Structures: 4 trials: x 1 10 trials: x 1 x 2 20 trials: x 1 x 2 x 3 A B C A B C A B C esponses (per min.) Training: A+ BC+ ABC- 180 160 140 120 100 80 60 40 0 2 4 6 8 10 Trial Blocks A BC ABC 21 1 0.8 0.6 0.4 0.2 P( A, ) P( B,C, ) P( A,B,C, ) 0 0 5 10 15 20 25 30 Trial Blocks

What s a configuration? Can account for experiments that are traditionally deemed Configural Conditioning Previous models cast configuration as result of stimuli being trained together. We view it as the result of model complexity pressures to group stimuli. 22

Expt: Second-order conditioning versus Conditioned Inhibition [Yin et al, 1994] Group A+ AB- C+ Test esult Test esult No B 96 0 8 B _ BC esp. Few B 96 4 8 B esp. BC esp. Many B 96 48 8 B _ BC _ 23

Bayesian Model of Second-Order Conditioning / Conditioned Inhibition Training: A+ AB- C+ MAP Model Structure: 4 trials: x 1 1 0.8 A B C 0.6 P( B, D ) P( C, D ) P( B,C, D ) 18 trials: x 1 x 2 0.4 A B C 0.2 18 trials: Alternative x 1 x 2 0 0 5 10 15 20 Number of AB- pairings 24 A B C

Dealing with einforcement Are reinforcers really just like other stimulus? x 1 Train: A+ B+ A B Do animals do this? 25

Acquired elational Equivalence [Honey & Watt, 1999] Biconditional training evaluation Test AY-food AZ-no food BY-food BZ-no food A-shock A vs C CY-no food CZ-food DY-no food DZ-food C-no shock B vs D 80 50 Activity (%) 75 70 Activity (%) 45 40 65 35 60 A C 26 30 B D

Modeling Acq. el. Equiv. 80 1 Activity (%) 75 70 65 1-P( Stim,D) 0.8 0.6 0.4 0.2 60 50 A C 0 1 A C Activity (%) 45 40 35 1-P( Stim,D) 0.8 0.6 0.4 0.2 30 B D 27 0 B D

Modeling Acq. el. Equiv. Biconditional training evaluation Test AY-food AZ-no food BY-food BZ-no food CY-no food CZ-food DY-no food DZ-food A-shock C-no shock A vs C B vs D x x 1 2 x x x 3 4 5 A B C D Y Z Food 28

Expt: Food Devaluation [Holland, 1998] Training Trials Phase 1 Phase 2 A-F1 B-F2 B- 0 16 16 28 6 40 Test F1 F2 Consumption (ml) 20 18 16 14 Food1 12 Food 2 10 0 10 20 30 40 Number of B-Food 2 Trials 29

20 L.V. model of Devaluation A-Food1 / B-Food2, B- Consumption (ml) 18 16 14 Food1 12 Food 2 10 0 10 20 30 40 Number of B-Food 2 Trials 1 Few B-Food2: A B x 1 x 2 Food 1 Food 2 0.8 0.6 Many B-Food2: x 1 x 2 x 2 0.4 0.2 1-P( Food 1, D) 1-P( Food 2, D) A B Food 1 Food 2 0 0 5 10 15 20 Number of B-Food 2 Trials 30

Not the whole story... (Variant on Devaluation) 60 [Holland, 1998] Training Trials Phase 1 Phase 2 A-F1 B-F2 F(1,2)- 16 Test 16 40 2 B 160 % time in food cup 50 40 30 Food - 1 20 Food 2-10 0 16 40 160 Number of B-Food 2 Trials Model Structure? x 1 x 2 A B Food 1 Food 2 31

Future Directions Explore the priors: Experimentally manipulatable. emove independent trial assumption. 32

Modeling Change Should reflect our understanding of how the world is believed to change. Example: Causal model parameter drift. The marginal distribution of the diffusion process should reflect your prior 33

Conclusions Similarity and Discrimination are recognized as the tradeoff between compexity and data fidelity arising in Bayesians inference. A latent variable is a natural (causal) setting for the study of classical conditioning. Account for configural conditionng data and more. 34